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Preface 



After three decades of research and practice, reuse of existing software artefacts remains 
the most promising approach to decreasing effort for software development and evolu- 
tion, increasing quality of software artefacts and decreasing time to market of software 
products. Over time, we have seen impressive improvements, in extra-organizational 
reuse, e.g. COTS, as well as in intra-organizational reuse, e.g. software product families. 

Despite the successes that we, as a community, have achieved, several challenges 
remain to be addressed. The theme for this eighth meeting of the premier international 
conference on software reuse is the management of software variability for reusable 
software. All reusable software operates in multiple contexts and has to accommodate the 
differences between these contexts through variation. In modern software, the number of 
variation points may range in the thousands with an even larger number of dependencies 
between these points. Topics addressing the theme include the representation, design, 
assessment and evolution of software variability. 

The proceedings that you are holding as you read this report on the current state-of- 
the-art in software reuse. Topics covered in the proceedings include software variability, 
testing of reusable software artefacts, feature modeling, aspect-oriented software deve- 
lopment, composition of components and services, model-based approaches and several 
other aspects of software reuse. 

May 2004 Jan Bosch 
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Abstract. Selecting components that satisfy a given set of requirements is a key 
problem in software reuse, especially in reusing between different domains of 
functionality. This concern has been treated in the ARIFS methodology, which 
provides an environment to reuse partial and formal requirements specifica- 
tions, managing the variability implicit in their incompleteness. In this paper, 
we define generic incomplete specifications, to introduce an explicit source of 
variability that allows reusing models across different domains, accommodating 
them to operate in multiple contexts. An extended formal basis is defined to 
deal with these tasks, that entails improvements in the reuse environment. 



1 Introduction 

Reusability has been widely suggested to be a key to improve software development 
productivity and quality, especially if reuse is tackled at early stages of the life cycle. 
However, while many references in the literature focus on reusing at late stages (basi- 
cally code), there is little evidence to suggest that reusing at early ones is widely 
practiced. The ARIFS methodology [1, 2] (Approximate Retrieval of Incomplete and 
Formal Specifications) deals precisely with this concern, providing a suitable frame- 
work for reusing formal requirements specifications. It combines the well-known ad- 
vantages of reusing at the requirements specification stage with the benefits of a for- 
mal approach, avoiding ambiguity problems derived from natural language. 

In an incremental process, the elements found at intermediate stages are characterized 
by their incompleteness. ARIFS is involved with a formal treatment of the variability 
inherent to this incompleteness. It covers the prospects of vertical reuse, i.e., reuse 
within the same domain of functionality. In this paper, we go one step further, to fully 
comply with the usual definition of software variability as “the ability of a software 
artifact to be changed or customized to be used in multiple contexts ” [12]. Our pro- 
posal is to support an explicit form of variability that allows reusing models across 
different domains. We do this by defining generic formal requirements specifications 
and extending ARIFS’ formal basis to classify and retrieve them. 



* Partially supported by PGIDT01PX132203PR project (Xunta de Galicia) 

J. Bosch and C. Krueger (Eds.): ICSR 2004, LNCS 3107, pp. 1-10, 2004. 

© Springer- Verlag Berlin Heidelberg 2004 
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The paper is organized as follows. Sections 2 and 3 describe the ARIFS reuse envi- 
ronment and the life cycle where it is applied. Section 4 introduces generic compo- 
nents and defines the basis for their classification and retrieval. Section 5 includes a 
simple example showing the advantages of the new proposal, compared to the original 
reuse environment. Section 6 comments other relevant works on the paper's scope and 
describes our future lines of work. A brief summary is finally given in Section 7. 



2 The SCTL-MUS Methodology 

SCTL-MUS [7] is a formal and incremental methodology for the development of dis- 
tributed reactive systems, whose life cycle (Fig. 1) captures the usual way in which 
developers are given the specification of a system: starting from a rough idea of the 
desired functionality, this is successively refined until the specification is complete. 




Fig. 1 . The SCTL-MUS life cycle 

Specifications are elaborated at the “Initial goals” stage. The requirements stated 
up to any given moment (in box “ SCTL ”) are used to synthesize a model or prototype 
(in box “ MUS ”). When it is ready, a model checker verifies the global objectives 
properties) of the system (“ SCTL-MUS verification”) to find if the model satisfies 
them; if it cannot satisfy them, neither in the current iteration nor in future ones (in- 
consistency)', or if it does not satisfy them, but it may in future iterations (incomplete- 
ness). In case of inconsistencies, the user is given suggestions to solve them. Then, by 
animating the MUS prototype (“User validation”), the user can decide whether the 
current specification is already complete or more iterations are needed. Upon com- 
pletion, the system enters the (“ Implementation ”) stage. Here, the prototype is trans- 
lated into the LOTOS process algebra [5] to obtain an initial architecture, that is pro- 
gressively refined until allowing its semi-automatic translation into code language. 

SCTL-MUS combines three formal description techniques. First, the many-valued 
logic SCTL (Simple Causal Temporal Logic) is used to express the functional re- 
quirements of a system, following the pattern Premise Consequence. Depending 
on the temporal operator, the consequence is assessed on the states where the premise 
is defined (=^ or “ simultaneously ”), their predecessors (=^0 or “ previously ”) or their 
successors (=^0 or “next”). Second, the graph formalism MUS (Model of Unspeci- 
fied States) is a variation of traditional labeled transitions systems (LTS), used to ob- 
tain system prototypes in terms of states and event-triggered transitions. Finally, the 
LOTOS process algebra is used to express the architecture of the developed system. 

In an incremental specification process, it is essential to differentiate functional 
features that have been specified to be false (impossible) from those about which 
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nothing has been said yet. SCTL introduces this concept of unspecification by adding 
a third value to the logic: an event can be specified to be true (1) or false (0), being 

not-yet-specified \ 2 ) by default. Analogously, MUS graphs support unspecification in 
both states and events, thus being adequate to model incomplete specifications. 

Unspecification entails that intermediate models have freedom degrees, so that 
they can evolve into potentially many systems. Therefore, unspecification implies 
variability. The incremental specification process makes the system under develop- 
ment lose unspecification at each iteration, by evolving not-yet-specified elements 
into true or false ones, to eventually become the desired system. ARIFS was defined 
to take advantage of this implicit form of variability for the purposes of reuse. 



3 The ARIFS Reuse Environment 

ARIFS provides for the classification, retrieval and adaptation of reusable compo- 
nents in SCTL-MUS. Its objectives are twofold: (i) to save specification and synthesis 
efforts, by reusing suitable incomplete specifications; and (ii) to reduce the extensive 
resources needed to check (at every single iteration) the consistency of medium to 
large specifications, by reusing previously obtained formal verification results. 

For these purposes, a reusable component is defined by: (a) its functional specifi- 
cation, expressed by a set of SCTL requirements and modeled by the corresponding 
MUS graph; (b) verification information summarizing the levels of satisfaction of the 
properties that have been verified on it; and (c) an interface or profile, used for classi- 
fication and retrieval, that is automatically extracted from the functional specification. 

Classifying Reusable Components. The idea behind classification is that "'the closer 
two reusable components are classified, the more functional similarities they have ”, 
According to this, we have defined four criteria to identify semantic and structural 
similarities [2], This section describes those relevant for this paper: the TC°° and NE°° 
functions, whose results are included in the profile of the reusable components. 

The TC°° function offers a semantic viewpoint of reusable components. It associ- 
ates with every MUS graph g 6 G a set TC°°(g) that contains sequences of events 
linked to its evolution paths. It follows the traditional complete-traces semantics [11], 
although it also reflects non-finite evolution paths and considers both true and false 
events, to differentiate these from not-yet-specified ones. On the other hand, NEfi° of- 
fers a structural viewpoint: given a MUS graph g 6 G, it returns a set NE?°(g) re- 
flecting the number of transitions that the model makes through every evolution path. 

For each O E { TC°°, Mi 00 }, an equivalence relation —O 6 G X G is defined, such 
that g —O g’ O (g) =0 (g’). This organizes reusable components into equivalence 
classes of components indistinguishable using (^-observations. There is also a preor- 
der relation Eo E G X G, given by g Eo g ’ O (g) E O (g ’) that establishes a partial 

order between equivalence classes, so that Eo) is a partially ordered set. 



Example 1. The following figure shows the result of applying the 7’C°° and NE°° 
functions to a MUS graph, g v that has four different evolution paths: (i) it can evolve 
through event b to a final state where d is not possible; (ii) it can reach another final 
state through events a and c; (iii) it can enter an endless loop through event a and an 
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infinite number of events e; and (iv), it can reach a final state through event a, any 
number of events e and then event c. 




T£/°°(<ri) = (6~id, aCf a(e )+Me) + c) 
NE°°(gi) = (1,2, (2)+,(3)+) 




TC^(g- 2 ) = (tie. b) 
NE°°(g 2 ) = ( 1,2) 



All these possibilities are reflected in TC^igf, where the + notation means that the 
sequences of events inside the parenthesis can be repeated any number of times. From 
the NE°° point of view, for the evolution paths enumerated above, the system makes 
one, two, at least two and at least three transitions, respectively. On another hand, it is 
easy to see that the MUS graph g 2 is NE°°- and 7’C’°°-incliided j n g r 



The relationships among MUS graphs extrapolate directly to the corresponding re- 
usable components. So, they allow organizing a repository of reusable components in 
two different lattices: the NE°° lattice, intended for structural similarities (horizontal 
reuse) and the TC 00 lattice, for semantic ones (vertical reuse). 



Retrieving Reusable Components. The variability commented at the end of Sect. 2 
allows the retrieval process to be based on functional proximity instead of on 
functional equivalence. Taking this into account, ARIFS performs approximate 
retrievals. Queries represent functional patterns of the SCTL statements that describe 
the system being developed. Actually, they are defined in the same terms as the 
profiles of the reusable components (NE°° and 7’C°° patterns), so the equivalence and 
partial orderings defined above also hold between queries and reusable components. 

For efficiency reasons, the retrieval process is split in two steps. In the first phase, 
the adjacent components of the query in the NE 00 and 7’C 3C ' lattices are located. In the 
second, the search is refined by quantifying the functional differences between those 
components and the query, in order to minimize the adaptation efforts. In the case of 
NE°°, differences are measured in terms of a numerical distance. As for TC°°, two 
functions assess semantic differences: the functional consensus and the functional ad- 
aptation. A detailed description of these aspects can be found at [2]. 

Finally, the user is given the choice between the component closest to the query 
according to NE 00 criterion, and the closest according to TO 30 . This selection cannot 
be fully automated because the adaptation cost is expressed differently in each case. 

Reusing Verification Information. The idea behind the reuse of verification infor- 
mation in ARIFS is that useful information about the system being developed can be 
deduced from reusable components that are functionally close to it. With this aim, 
each reusable component stores verification results about every property that has been 
verified on it. We have proved that, for any two 7’C°°-related components, interesting 
verification information can be extracted from one to the other, helping to reduce the 
amount of formal verification tasks throughout the specification process. The results 
of this work are summarized in [ 1 ]. 



Supporting Software Variability by Reusing Generic Incomplete Models 



5 



4 Defining Generic Components 

Practical experience with ARIFS has revealed some features that can be improved. As 
shown in Fig. 2, an ,V£ l3C -based retrieval may return wildly different components, 
hard to adapt to the desired functionality despite being A® 00 - equivalent to the query. 





Fig. 2. Limitations of NE and TCf° criteria 

On the other hand, the TC°° criterion can only return components whose transitions 
are labeled the same way as the query. Thus, in the situation depicted in Fig. 2, the 
TC* search using the pattern {ab, ac} would not find any suitable component. How- 
ever, it is easy to notice that one of the components in the repository would be TC °°- 
equivalent to the query under the mapping ( a — > i, b — > j, c — > k). What is more, any 
property R ' verified on that component gives the same verification results as R on the 
current specification, where R’ is obtained from R by the same mapping. So, the veri- 
fication information linked to that component is potentially useful for the system be- 
ing developed. However, it can not be reused with TC 00 , because this criterion is too 
linked to the domain of functionality of every particular component. 

Generic components are introduced to address these problems. They have the same 
information as classic components (Section 3), but with the functionality and the veri- 
fied properties expressed in terms of meta-events. Meta-events are identifiers such 
that different meta-events of a generic component must represent different events. To 
deal with them, we introduce a new criterion: MT D °. 

MT 00 associates with every MUS graph g 6 G a set M7°°(g) that contains se- 
quences of meta-events linked to its evolution paths. Two graphs g and g ’ are MT°°- 
equivalent ( g ~MT g ’) iff a one-to-one mapping between the actions of g and g ’ exists 
such that, having done the mapping, g"“‘ p = TC g’. Analogously, a graph g is MT°°- 
included in another graph g ’ (g — MT g ’) iff all the actions of g can be mapped to a dif- 
ferent action of g ’ in a way that, having done the mapping, g"“ v Etc g ’ (see Fig. 3). 

An ordering exists (Eq. (1)) such that, if two components are rC°°-related, they are 
MT °°- and A£°°-related in the same way: C Etc C. C Emt C. =*■ C Ene C. 

NE°° A MT 30 A TC°° (1) 

Our proposal in this paper is to organize the repository in a single lattice of generic 
components, using the M7°° relations. MT 00 merges structural and semantic view- 
points in a convenient way: it is much less permissive than NE°° identifying structural 
similarities, and abstracts the domain of functionality by considering generic actions. 
This introduces an explicit form of variability in the alphabet of actions of a reusable 
component. We also propose MT°° to be the only criterion to conduct the retrieval 
process, automating the decision of which component to reuse; and the profile of re- 
usable components to be formed by the results of the A£°° and M7°° functions. 
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Fig. 3. g is MT -included in g' 



5 A Simple Case Study on Applying MT°° 



This section includes an example to show the advantages of the MJ°° criterion com- 
pared to TC°°. The bulk of the section deals with the efforts needed to adapt the re- 
trieved components, so we pay no attention to the NE 00 criterion, which entails 
greater efforts in the general case (see Sect. 4). In line with the motivation of MT°°, 
we consider a repository with reusable components from two different domains of 
functionality: communication protocols and automata for the control of elevators. 





i 

I o = connect 

I I) = am f inn 
J c = tx 
I d = ack 
I e = release 
I / = nack 
I g — error 
! h = loss 



Fig. 4. Components obtained from the specification of stop-and-wait sender processes 

The components in Fig. 4 were obtained at intermediate specification phases of 
stop-and-wait senders. Events connect, confirm and release serve to model connec- 
tions between sender and receiver; after the sender transmits a data frame (action tx), 
the receiver may acknowledge it positively (ack) or negatively (nack); finally, events 
error and loss can be used to model disruptive behavior of the channel. 





j i = out 
I j = arrive. 

I k = open 
I / = in 
1 m — dose 
j n = timeout 
I p — stuck 



Fig. 5. Components obtained from the specification of elevator automata 

Figure 5 shows the elevator automata components. The buttons inside and outside 
the elevator issue in and out events; the door is modeled with open, close and stuck; 
event arrive occurs when the elevator gets to the requested floor; finally, timeout lim- 
its how long the elevator may stay in a given floor before serving other requests. 
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Suppose that an user is working on the specification of a new stop-and-wait sender, 
which currently consists of the SCTL requirements shown in Eq. (2) (the actions have 
been relabeled as in Fig. 4). This partial specification, Spec, will serve to compare the 
performance of the TC°° and M7°° criteria when it comes to avoid synthesis tasks by 
adapting a reusable component. 

f /?, = a =>0 (& =^0 (^2 A Rf)) 

Spec= \ R 2 = c^O(d=t>0(e=*ORi)) (2) 

[Ra = c =J>0 (/ (*2 A R 3 )) 

Q = TC^iSpec) = (( abcde)+ , ab{cf)+, ( ab{cf+)de )+ ) (3) 

The TC°° Retrieval. The first step of the TC°° retrieval is to place the query in the 
TC°° lattice, by applying the TC°° criterion. Then, the second step scrutinizes the ad- 
jacent components to return those which minimize the adaptation efforts to the query. 
In our example, the query is the TC°° pattern of Spec, shown in Eq. (3). 




Fig. 6. Placing the query in the TC lattice 

Figure 6 shows how the repository is organized by applying the TC°° criterion. It 
can be seen that components from different domains form separate sub-lattices, even 
though there exists strong resemblance between some of them. The first step of the 
retrieval places Q between components Cj, C, and C 3 (see Fig. 6); the second selects 
C 2 , because it has the lowest adaptation efforts, as measured by the functional consen- 
sus and functional adaptation functions (Section 3). 

The MT°° Retrieval. In the M7°° retrieval, the query is again the TC 00 pattern of 
the specification (Eq. (3)), but it is placed in the MT°° lattice by applying the M7°° 
criterion. In this lattice, the different sub-lattices of Fig. 6 become interlaced, because 
MT 00 allows relationships between components from different domains. Moreover, it 
finds several components to be equivalent, and therefore merges them into a single 
generic one (for instance, GCE n stands for Cj and /-(). As a result, the size of the re- 
pository is reduced from nine components to six. 

The first step of retrieval returns components GE 3i and GE, (see Fig. 7), and GE, is 
finally selected, according to the functional consensus and functional adaptation 
functions (Section 3), which are straightforward to redefine in terms of meta-events. 




Fig. 7. Placing the query in the MT lattice 
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Comparing the Two Criteria. Knowing what components the TC°° and M7°° 
criteria retrieve, we can look at the adaptation efforts needed in each case to match the 
desired functionality. Figure 8 shows components C 2 and GE 5 , with their functional 
adaptation regarding Q emphasized in gray (meta-events are represented by Greek 
letters). In this case, the desired component C Q is just the result of eliminating the gray 
parts of both graphs. Clearly, GE 5 (the component retrieved by M7°°) has less 
functional excess than C 2 (the one retrieved by 7C°°), so its adaptation is easier. 




Fig. 8. Adopting the components retrieved by TC and MT 

In fact, the components retrieved by MT°° can never be harder to adapt than those 
retrieved by TC 00 . As a result of Eq. (1), TC 00 relationships are kept in the MT 00 re- 
pository, so, in the worst case, the M1°° retrieval returns the same components as the 
TC°° retrieval. Since components from different domains get interlaced in the M7°° 
lattice, it increases the chances to retrieve functionally closer elements, which are 
therefore easier to adapt. So, the new source of variability introduced by the use of 
generic components (Section 4) turns out to be beneficial. 

Reusing Verification Results. The fact that TC'°° organizes components from 
different domains in separate sub-lattices prevents reuse across domains. The 
interlacing due to MT 00 allows sharing verification information between different 
domains, increasing the amount of information retrievable for any given query. 

In our example, reusing component GE 5 entails recovering all the verification re- 
sults of properties not affected by the adaptation process. For instance, assume that 
one of the properties previously verified on E 5 is close timeout {“the door of the 
elevator cannot close if it has not been opened for the length of a timeout”). GE S 
stores it as the generic property T l which, under the mapping that relates the 

query to GE 5 , becomes release ^G ack {“a connection cannot be released without 
receiving an acknowledgment first”). As the events involved are not affected by ad- 
aptation, the verification results of E 5 hold directly for the retrieved component. 



6 Discussion and Future Work 

Approaches to managing collections of reusable components are usually based on the 
same idea: establishing a profile or a set of characterizing attributes as a representa- 
tion of the components, and use it for classification and retrieval. In many cases, like 
in [4] and [8], they are based on natural language. Other works resort to formal meth- 
ods to describe the relevant behavior of a component, thus minimizing the problems 
derived from natural language. In this case, the typical specification matching solu- 
tions rely on theorem provers, which requires a large number of formal proofs and 
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makes a practical implementation very hard. Several techniques have been proposed 
to address this problem in the reuse of source code [9, 14]. In contrast, our proposal is 
intended to reuse formal components at early stages, and avoids formal proofs by de- 
fining an approximate retrieval process based on the incompleteness of intermediate 
models. Another distinctive feature is that it performs reuse automatically, with no 
intervention needed from developers. This contributes to solve a major flaw of tradi- 
tional repository systems, which would offer no help if developers made no attempt to 
reuse. In this sense, our work is in line with the active repository system of [13], al- 
though this is applied to code and based on natural language. 

Section 2 describes how the concept of unspecification , characteristic of the SCTL- 
MUS methodology, relates to software variability. In this regard, we believe that 
managing a repository of incomplete models is highly useful, as long as its organiza- 
tion reflects a hierarchy of variability: by choosing the right component to reuse, we 
are dealing with the right amount of variability, no more no less. As far as we know, 
our view is innovative in the area of formal specification approaches. Moreover, very 
little has been published about variability in life cycles other than the classical water- 
fall model. Only in [3] is it analyzed in the context of Extreme Programming, which, 
just like SCTL-MUS, defines an iterative life-cycle. 

The incremental approach of SCTL-MUS has two main advantages. First, as eve- 
rything is not-yet-specified by default, variability is not limited to the variation points 
introduced intentionally by developers. Second, as we use the same software artifacts 
all through the development process, the techniques for variability realization are 
greatly simplified. Note that, in a waterfall cycle, different techniques must be used 
depending on the phase at which a variation point is introduced and the phase at 
which it is bound [10]. 

Most of the research on software variability has been done in the field of software 
product lines [12]. A product line defines an architecture shared by a range of prod- 
ucts, together with a set of reusable components that, combined, make up a consider- 
able part of the functionality of those products. The work presented here is not di- 
rectly applicable in this area, because SCTL-MUS specifications lack architecture. 
However, we are currently working to extend this methodology following the princi- 
ples of aspect-oriented programming (AOP) [6]. The mechanisms introduced in this 
paper still hold in the revised methodology, as long as compositions will be expressed 
in the same formalisms of the composed elements. In this context, we hope that the 
philosophy of the MT°° criterion may be useful to detect crosscutting functionality 
-the leit motif of AOP-, because it can find common algorithmic patterns between 
several components, regardless of the particular names of the events. 



7 Summary 

This paper presents some enhancements to an existing framework for the reuse of 
formal requirements specifications. The main contribution is the definition of generic 
reusable components, which, by abstracting their original domain of functionality, 
allow reusing models and formal verification results across multiple application areas. 
This introduces an explicit source of variability in the alphabet of actions of the reus- 
able components, that complements the implicit one inherent to incompleteness. 
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A new criterion, MT °°, has been defined to organize generic components. It en- 
hances the effectiveness of the retrieval process, by increasing the amount of infor- 
mation available to reuse, and by reducing the efforts needed to adapt the retrieved 
components to the desired functionality. It has also been experienced to cause effi- 
ciency improvements, because it simplifies the management of the repository (one 
only lattice, fewer components) and, being the only criterion to decide what compo- 
nent to reuse, it fully automates the retrieval. The presumable greater cost of finding 
MT°° relationships, compared to TC°° or NE°° ones, is masked by the incremental ap- 
proach, because we can start from the event mappings used in preceding iterations. 
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Abstract. In order to make software components more flexible and reusable it 
is desirable to provide business users with facilities to assemble and control 
them, but without first being converted into programmers. We present our fully- 
functional prototype middleware system where variability is externalized so that 
core applications need not be altered for anticipated changes. Application 
behavior modification is fast and easy, suitable for a quickly changing e- 
commerce world. 



1 Introduction 

Domain experts see business processes as "themes and variations". When describing a 
process they first describe its broad sweep or theme, then come back to fill in the 
details as a sequences of variations on the theme. In most cases, these variations 
capture business practices or policies that change more often than the rest of a 
business process. Changes can come from inside (e.g., a new discount strategy) or be 
mandated from outside, typically a regulatory agency (e.g., federal tax rate 
computation). Traditionally, these policies have been buried in application code. 
Externalizing them from the rest of the application has many advantages. It makes the 
business policies explicit, increasing both application understanding and consistency 
of business practices. It also allows the enterprise to lower the cost of application 
maintenance. 

This paper describes our fully-implemented infrastructure and framework, Fusion, 
that allows applications to be structured and developed such that the core behavior of 
the business process is built-in while the variations are managed and applied 
externally by business users (or non-programmers - we use these terms 
interchangeably). We will present the overall Fusion architecture and show how it 
addresses three fundamental goals we strived for: empowering the business users to 
define and manage policies, easing the integration with and migration from existing 
applications, and finally minimizing performance impact. 

We will discuss related work before describing our system in detail. 



2 Related Work 

The system described in this paper builds on our previous work “ Extending Business 
Objects with Business Rules” [7], inspired by its deficiencies plus new requirements. 
Previously, we provided programmers facilities to systematically externalize 
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variability for management by business users - via trigger points, externally 
parameterized code fragments, and a management tool. Our approach was effective 
(a successful product resulted), but many capabilities from design-to-production were 
limiting or non-existent, especially for business users. 

Using terminology introduced in “On the Notion of Variability in Software Product 
Lines ” [9], our improved middleware system provides “open variation” points 
following the “multiple parallel variants” pattern. We describe how variability is 
implemented, but leave for the future discussion of some management issues. 

2.1 Expressing Externalized Variable Logic 

How should business users express externalized variable logic? OCL [4] is 
recommended by the Object Management Group for defining constraints in models. 
Although easy to read and write, OCL is a pure specification language - expressions 
are side-effect free, and implementation issues are not addressed. This wasn’t suitable 
for us, since we need to express computation including data modifications. 

SBA [11] allows non-programmers to describe and manipulate information using 
tables, business forms, and reports. Automation is accomplished by giving 
“examples” to the system on how to manipulate information. The techniques used in 
SBA do not extend easily to less data centric applications and the advocated “query 
by example” paradigm, essentially requires business users to learn a programming 
language. 

2.2 Managing Externalized Variability 

How should externalized variability be managed, design-to-production? The Variation 
Point Model [10] represents variation within a model. Complementary to our work, 
the technique presented could be used for identifying, within core applications, points 
of variability and information available at them (i.e., context). Our work further 
suggests that the variability need not be programmed by a programmer, but can be 
authored by a business user. 

“Support for Business-Driven Evolution with Coordination Technologies ” [2] recognizes 
that software development techniques, such as object oriented ones, do not effectively 
address software evolution. A coordination contract discipline is introduced that helps 
in this domain, but it targets software developers, not business users. 

JAsCo[3] is a set of aspect-oriented techniques for programmers to connect 
business rule implementations with business rules-driven applications. These 
techniques could automate the insertion of variability points in applications (see 
section 3.3). 

2.3 Externalized Variability Systems 

A number of commercial enterprises offer rule-based software [14-17] as a way to 
enable non-programmers to specify application logic. Most of these rule-based 
systems execute using inferencing engines. Because rules are declarative, they are 
considered “natural” to business users. Business users can indeed parameterize 
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templates, but more general logic authoring requires a skilled IT person with 
knowledge of inferencing concepts (e.g., working memory, implicit iteration ...)• As 
pointed out in our previous paper, [8], inferencing is appropriate to solve a number of 
problems, but our experience is that most externalized variable logic doesn’t require 
inferencing. Meeting with potential users, we learned that many were unwilling or 
unable to employ inferencing -based systems because the results are not 100% 
reproducible. 

“Evaluating Expert-Authored Rules for Military Reasoning” [6] enables subject matter 
experts to manage knowledge directly without formal logic training. The focus is on 
writing sensible rules by non-logicians for an inference engine, while our 
concentration is on enabling authoring of simple, autonomous, and memory-less 
statements. 

“Naked Objects: a technique for designing more expressive systems ” [5] unveils a toolkit 
to expose object methods using a noun-verb style. It uses reflection to compose a lone 
user interface view. It allows for just one language based upon the subject Java object 
collection. It requires that participating Java objects implement the Naked Objects 
interface raising concern that the business objects themselves become “bloated” in 
order to be all things to all people. In keeping with one of our main goals, easing the 
integration with and migration from existing applications, our work does not place 
any requirements on participating objects, such as the implementation of a particular 
interface; we allow for many views of objects; we facilitate construction of 
customizable languages permitting renaming, hiding, and composing for each 
customized vocabulary constructed. 



3 Fusion 

The Fusion distributed middleware system is comprised of two types of components, 
some used at runtime and others used for development (see Fig. 1). There are three 
basic runtime components: the rule-enabled application (e.g., a Web Application), the 
Connector Registry, and the Runtime Engine. Each of these is briefly described 
below; more details are given later. 

Applications using the Fusion middleware system are manufactured by 
programmers in the usual way; with the exception that rule-enablement occurs 
through the embedding of one or more “Points of Variability” (PoVs), where 
appropriate. PoV embedding is either a manual or an automatic patterned process. 
Development of the application and placement of PoVs in the core application still 
requires programming skills, but this is usually done just once and not changed very 
often. 

The Connector Registry provides for a level of indirection between rule-enabled 
applications and the Runtime Engine. This allows rules to be shared between 
applications without creating dependencies between them. 

The Runtime Engine executes the rules on the production server. 

There are five basic development components: Vocabulary Authoring, Logic 
Authoring, Connector Authoring, Code Generation, and Artifact Management. There 
is also a Deployment component which takes the development artifacts and deploys 
them to runtime entities. Again, each of these components is briefly described below; 
more details are given later. 
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The Vocabulary Authoring component provides the means to define terms that can 
later be used in the construction of sentences using the Logic Authoring component. 
Vocabulary terms are comprised of two basic parts: properties and mappings. 
Properties are used mainly to display information to business users, while mappings 
are used to produce executable forms. For example, the alias property for a particular 
vocabulary term might be “the customer’s name” while the invocation mapping might 
be “Customer. getName( )”. 

The Logic Authoring component enables business users to express business logic 
in a mistake-proof environment, syntactically speaking. Whether constructing 
compound sentences, performing calculations, or making assignments, business users 
manipulate familiar terms, such as “the customer’s name”, while the Logic Authoring 
component assures that the underlying parameter and return types match accordingly. 

The Connector Authoring component provides for indirect, type-safe, efficient 
linkage between rule-enabled applications and sets of business rules. It produces 
PoVs for inclusion with (and in order to) rule-enable applications, and a runtime 
business rule set selection mechanism and transformation service. 

The Code Generation component transforms the business logic authored by 
business users expressed in vocabulary terms familiar to them into an executable 
form. This component processes the outputs of the Logic Authoring and Vocabulary 
Authoring components to produce Java class files. 




Fig. 1 . Fusion externalized business logic development and mntime components 



The Artifact Management component and the Library form a repository of Fusion 
produced artifacts in extensible Markup Language Metadata Interchange (XMI) 
format. By formalizing and separating the data in a structured way, extensions and 
transformations are readily possible. 

The development and deployment aspects of the Fusion system were built as a 
collection of plug-ins to the Eclipse [12] platform plus a browser-based tool. We 
utilize the Eclipse Modeling Framework (EMF) [13] to manage the meta-model of the 
development-time artifacts as well as their persistent instances. 
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3.1 Vocabulary 

One of the fundamental problems that we faced was the inherent mismatch between 
the data structures used by the application programmers and those usable by business 
people. Programmer developed Java code is not understandable by non- 
programmers. Therefore, we decided to offer the business users a more appropriate 
data view. 

The Vocabulary Authoring component presents to business users a simple class 
model, without inheritance. The model consists of a set of entities corresponding to 
business concepts. An entity can have attributes and relationships to other entities as 
well as operations that can be performed. The names of entities and members are 
arbitrary, unlike Java: “The next business day” is a perfectly legal operation name. 

Operation arguments can be infixed, so that “Cancel orders pending more than 

days”, can be the name of an operation, where “ ” represents the operation’s 

argument. For convenience, users can organize entities into a hierarchy to ease 
development of systems with large numbers of classes. 

Naturally, the members and the relationships presented to business users have to be 
implemented and related to the objects that the application will pass to the Fusion 
Runtime Engine. A possible design choice (and one taken by a number of other 
business rule vendors) is to define Java classes that correspond directly to the model 
presented to business users, requiring application programmers to use those classes 
only. We felt that this was unacceptable for two reasons: 

1. Since the vocabulary model is unlikely suitable for the application’s purposes, 
the application programmer is forced to translate the application’s data to and 
from the objects constructed for business users. This translation process adds 
additional burden onto the application programmer, and is likely error-prone. 

2. Business rules are subject to frequent change. Whenever a change requires 
data and/or behavior that are new, the business model will have to be changed. 
If the application program has to consume the objects constructed from this 
model, the application program will be subjected to these changes in turn. 
This would defeat the purpose of externalization. 

Our solution was to support translation from application program objects to 
vocabulary business entities, so that both the business user and the application 
program can use objects suitable for their purposes. This is made more difficult by our 
resolve to make defining this translation process as easy as possible, and to enable the 
translation to be updated when the business model changes. 

The Vocabulary Authoring component maintains three data structures: 

• The vocabulary model, which defines the business user’s model: what 
business entities, members and relationships are available. 

• A model of a subset of the application programs. Specifically, we keep track 
of information about all the application classes and members that are either 
going to be passed to or from the Fusion Runtime Engine, or which are 
necessary to implement the vocabulary model. 

• The “mapping information” which states how each vocabulary business 
entity, member and relationship is implemented. 

We allow vocabulary entities to differ a great deal from the application classes by 
which they are ultimately implemented. Although each instance of a vocabulary 
entity must correspond to an application object, each member and each relationship 
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just has to be implemented by a type-correct series of operations performed upon 
application objects. For example, the vocabulary might include “Customer” and 
“Order” entities, in which an operation on Order provides “Does this order’s value 
exceed the customer’s limit?”. This might be implemented by the code 
“(Client. getMaximumOrderLimit(this.getClient() > this.getTotalPrice())”, where 
“Client” is an application class that maps roughly to the vocabulary entity 
“Customer”. 

In detail, the mapping information keeps for each vocabulary member and each 
relationship an abstract syntax tree (AST) describing its implementation. This AST is 
rich enough to include all the computational power of Java expressions. The AST 
refers only to information found in the application class model, so that if the 
application classes are modified, we can determine the impact of these changes upon 
the mapping information. Likewise, when the vocabulary model is changed, we can 
determine what mapping information is affected. 
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Fig. 2. Fusion Eclipse-based vocabulary management tool 

We provide tools for constructing vocabulary models (and their associated 
mapping information). One tool (shown in Fig. 2) inspects Jar files containing Java 
class files (no Java source is necessary), and constructs a vocabulary model 
corresponding precisely to these classes. The user is then given the ability to: 

• Omit classes which business users shouldn’t use 

• Omit members which aren’t appropriate 

• Rename entities and members to be more appropriate 

• Construct new vocabulary members which don’t correspond to any 
application members 

• Change the implementation of any vocabulary members 

• Hide a method on class A returning class B as the mapping information for a 
relationship between corresponding entities A and B 

To support use-cases where users desire to build a vocabulary model 
independently of existing artifacts (either because they are not suitable for business 
users or because application building is proceeding from scratch), we have another 
tool that takes a UML model and constructs a corresponding vocabulary model. The 
mapping information can then be added with our mapping tool. Thus, Fusion tools 
support both a bottom-up and a top-down approach to defining business models. 
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3.2 Logic Authoring 

We provide two tools for authoring rule sets, rules, and templates: one for technically 
capable business users and a second for those less sophisticated. The former is an 
Eclipse-based tool that allows authors to manage all aspects of rule sets, rules, and 
templates, including creation, deletion, and significant modification. The latter is a 
web-based tool that constrains authors to limited changes, such as enabling/disabling 
a rule, or completing a template by choosing an option from a pull-down menu. We 
discuss below the Eclipse-based tool. 



f Fusion - Eclipse Platform 
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Fig. 3. Fusion Eclipse-based rule set authoring 

Rules and templates are organized into rule sets. Before rules and templates can be 
authored, a rule set must first be created. The rule set author creates the rule set and 
gives it a name. From the available vocabulary (previously defined using the 
vocabulary authoring tool described in Section 3.1), the rule set author selects the 
inputs to and outputs from the named rule set. The inputs to a rule set are collectively 
known as the input group, and similarly the outputs from a rule set are collectively 
known as the output group (see Fig. 3). 

Once a rule set is created, individual rules can be added, updated, or deleted. To 
add a rule to a rule set, first the rule author supplies a rule name. Next, the rule author 
selects terms from the scoped vocabulary (i.e., specific object instances) to create an 
if-then-else statement. The scoped vocabulary available for authoring a rule is 
determined by the input and output groups of the containing rule set. It includes the 
terms in the input and output groups, plus terms that are navigable from them through 
relationships, as well as some built-in vocabulary (such as Date, Time, String, 
Number, equals, less then, etc.). Terms can be selected and dragged into an if-then- 
else form by the rule author (see Fig. 4). For example, the rule set author might define 
“order” in the input group and “discounts” in the output group. The rule author might 
then define a rule using the scoped input and output as follows: 

Senior Citizen Discount rule: If customer is a senior citizen and 
order total is greater than or equal to 200 then add 10% senior 
citizen discount. 
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In the above example, the name of the rule is “Senior Citizen Discount rule”. The 
rule author drags and drops vocabulary terms and operators onto the if-then-else form. 
The vocabulary terms used in the example are “customer is a senior citizen”, “order 
total”, and “discount”. The built-in vocabulary terms used in the example are “and”, 
“is greater than or equal to”, and “add”. 

The rule author is prevented from making syntactic mistakes. For instance, when 
using the “is greater than or equal to” vocabulary term the underlying numeric types 
must match. In the above example, vocabulary term “order total” is an integer that 
can be compared to another integer vocabulary term or expression. Any attempt to 
drag-and-drop an incompatible type will fail with a pop-up message. 




Fig. 4. Fusion Eclipse-based rule authoring 



One controversial issue which surfaced was the perceived requirement by some 
that the authored rules always be syntactically correct, even during construction of 
complex statements. This often forced the rule author to author multi-conjunctive 
clauses in a particular order, which we ourselves and our users found irritating. 

A template is comprised of a rule that has portions designated as substitutable. 
Subsequently, a new rule can be created by filling in the substitutable portions. The 
rule template author selects an existing rule for templatizing, gives the template a 
name, and selects those portions subject to substitution. Each substitutable portion is 
assigned a variable name by the rule template author. 

Senior Citizen Discount Template rule: If customer is a senior 

citizen and order total is greater than or equal to <amount> then add 

10% senior citizen discount. 

In the above example, the name of the rule template is “Senior Citizen Discount 
Template”. The rule template author selected from the if-then-else form the term “4” 
and assigned the substitution variable name “amount”. New rules can be defined 
based upon rule templates by less skilled business users employing the browser-based 
tool. 

Once rule sets, comprised of rules and rule templates, have been completed, they 
are deployed to a test or production system using Fusion deployment tools. Once 
deployed to the runtime, the rule sets are available for use by rule-enabled 
applications. 
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Fig. 5. Fusion Eclipse-based template authoring 



3.3 Application Connectivity 

The Vocabulary and Logic Authoring components empower skilled and less skilled 
business users with the ability to author business logic outside an application program 
proper. The functionality offered to business users is limited - not nearly as powerful 
as programming languages such as C++ or Java. Programmers are still called upon to 
author the core application. Connecting the core application with the externalized 
business logic is the challenge met by the Application Connectivity component. 

This component serves a number of purposes. First, it provides indirection between 
the application request for service and the externalized business logic (rule sets) 
authored by business users. Second, it insures handshaking type safety between the 
core application and the externalized business logic. Asking the programmer to 
construct and pass arrays of objects, as was the case in our prior work [7], is prone to 
error. Third, it promotes definition and extemalization of selection criteria (e.g., in 
the insurance industry, the business logic for a particular business purpose could vary 
by date, U.S. State and product). At runtime, the Fusion selector framework employs 
the external information to bind a particular business logic to a particular application 
request. The remainder of this section explains how these three objectives are met. 

The application connectivity component produces two types of artifacts: Points of 
Variability (PoVs) and a Connector Registry comprised of Logical Operations 
(LOps). The former are generated code stubs, deployed with the core application, 
containing methods invoked whenever externalized business logic is employed. A 
PoV is includes a name, selection criteria signature, an evaluation signature, and a 
connected LOp name. The latter matches incoming PoV requests with desired rule 
sets based upon selection criteria. A development-time tool insures that the 
evaluation signature of the PoV matches or can be transformed into that of the 
associated LOp. 

At development-time, the Connector Registry component is configured by 
providing the list of available LOps. An individual LOp is a contract (or signature), 
similar in purpose to Java interfaces. It is comprised of a name, input expected and 
output produced (signature), a description (purpose), a collection of eligible rule sets 
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based upon selection criteria (e.g., start-end dates, state), and a selection context 
signature (type of information that will be available at runtime to select the 
appropriate rule set). For example, a rule-enabled application may request that the 
“determine eligibility” logical operation be employed. The selection algorithm 
employed by the LOp for “determine eligibility” might be: link to rule set 1 on even 
numbered days; else to rule set 2 on odd numbered days. The input and output types 
are guaranteed to match. 

At runtime, the Connector Registry listens for Business Logic Request (BLR) 
events fired by PoVs. A BLR event encapsulates information about the PoV, the 
selection context which specifies information used to select the most relevant rule set, 
and input parameters needed for the evaluation of the selected rule set. The Connector 
Registry processes a BLR event in three steps: match, evaluate and return. First, its 
selection framework finds the rule set whose selection criteria match the selection 
context of the BLR event. Second, the selected rule set is evaluated using the BLR 
event input parameters to produce desired output (data transformations are performed 
if required). Finally, the evaluation result is returned to the PoV, which returns it to 
the application. 

In the future, this last step could be performed by firing a Response Event which 
would encapsulate the initial BLR event and the evaluation result. It would then be 
processed by the PoV that would be listening for such events. Other components (e.g., 
Business Activity Monitoring components) could also listen for these events. 
Furthermore, this communication model would permit Fusion runtime to easily work 
in conjunction with an event based reactive rule system such as AMIT [ 1 ]. 

Rule set selection is dynamically performed at runtime by the selection framework. 
Although selection logic is potentially arbitrary, we designed a default table driven 
algorithm which, we believe, covers most useful cases. The criteria definition 
corresponds to the structure of the table used by the selector, for example “select logic 
by date, state and purpose.” When connecting logic to applications, the business user 
specifies a particular instance of that criteria (01/01/2000, 06/14/2002, NY, insurance 
premium computation). At runtime the selector algorithm matches instance values 
encapsulated in Business Logic Request Events against the selector tables. Our 
design handles an arbitrary number of dimensions restricted to be of simple type 
(numeric, enumeration, string) and possibly constrained (integer between 1 and 100). 

Making the selection framework a first class citizen in the architecture achieves 
two objectives. First, it makes rule management easier and more scalable by 
introducing a highly structured partitioning of rules. For example, it enables users to 
work with a particular subset meaningful to an application (e.g., NY rules). In a 
typical scenario, a user updates the rules to match regulatory changes in a specific 
state. Presenting the rules separately by state in the tools helps the user focus on the 
rules for that particular state, without the distraction of rules for other states. Given 
additional restrictions on the criteria dimensions (e.g., lower and upper bound for 
numeric), we are able to verify common business consistency requirements such as 
complete coverage (e.g., check that a premium computation logic exists for every 
state). Second, it enables better runtime performance by sub-setting the rules that are 
considered at runtime. In the insurance example, when a policy component is 
invoked for “NY”, the rule engine need only evaluate the “NY” rules, ignoring those 
for all other states. Rules are selected with a more efficient SQL lookup resulting in a 
performance boost compared with a design that lumps the rules for all 50 states 
together. 
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3.4 Deployment and Runtime 

Runtime performance and correctness are critical. While during development it is 
common for there to be (temporary) inconsistencies between artifacts, the deployment 
process must detect and prevent inconsistent updates to the production server. We 
must also optimize rule set execution, to maintain an acceptable level of performance. 

Our runtime engine associates each Logical Operation with a specific data flow 
between Nodes, where each Node is the executable representation of a rule set. 
Currently, the executable representation of a Fusion rule set is a Java class file. 
However, our runtime engine is general-purpose: it supports a plugin mechanism 
whereby other rule set representations can be executed. In fact, earlier in our research 
process we were using another executable representation for Fusion rule sets, and 
switching to Java only required creating a new plugin - no changes to the engine 
itself. 

The deployment process takes place in two steps. First, the output of the 
development tools is transformed into a deployment file. In order to create it, we first 
check that all development artifacts are consistent: the implementations of our Logical 
Operations create values of the correct type, and so forth. Then, each Fusion rule set 
is translated into a Java class file. The output of this process is a single file containing 
all necessary information for deployment to a server. 

In a second step, a deployment file is deployed to a server. Another set of 
consistency checks is done to make sure that the server state is what the deployment 
file expected. This prevents the system from crashing if the server state accidentally 
gets out of sync with the development system. Finally, the system state is updated in 
a transactional manner so ongoing operations always experience a consistent system. 



4 Conclusions 

Fusion addresses our three main objectives: usability, integration, and performance. 
We’ve shown the architecture of a prototype middleware that facilitates business user 
control of exposed aspects of applications in a systematic and manageable way 
without need for programmer assistance. We’ve shown development and runtime 
components of our middleware, describing the purpose and operational characteristics 
of each. 

The Fusion system provides application connection componentry which makes 
integration with existing and future rule-driven applications fairly simple. It 
introduces minimal runtime overhead, thus having minimal negative runtime 
performance impact. Fusion does not introduce new runtime objects to support 
business user vocabularies which might adversely affect performance, but rather uses 
existing objects unmodified. Fusion does provide indirection for rule set selection 
through a connection registry which does have some minimal performance cost, but 
with the added advantage of increased flexibility. Ongoing investigation is addressing 
outstanding issues including: integrating variability concepts in the overall application 
lifecycle; extracting legacy rules from code; and rule management. Under the rule 
management umbrella, query capabilities are needed to answer business user 
questions such as: which rule sets are now in force; which rules produce discount 
results; which rules are concerned with senior citizens; and so forth. Additional meta- 
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data and facilities are needed for managing versions, dependencies, consistency, and, 
at a higher level, to understand compliance with business policies and the impact of 
proposed changes to name a few. 

We postulate that middleware, embedded systems, and others will employ 
externalized logic techniques in order to create more reusable off-the-shelf 
components in the near future. Using such techniques allows for late binding of core 
general purpose applications with customizations appropriate to the surrounding 
environment. It also shortens the turn-around time between expressed desire by the 
application owners and the programmers normally required to make the appropriate 
adjustments. 
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Abstract. The methodologies of product-line engineering emphasize proactive 
reuse to construct high-quality, less costly products. The requirements for a 
product line are written for the group of systems as a whole, with requirements 
for individual systems specified by a delta or an increment to the generic set [1], 
Therefore, it is necessary to identify and explicitly denote the regions of 
commonality and points of variation at the requirement level. In this paper, we 
suggest a method for producing requirements that will be a core asset in the 
product line. Briefly, requirements for families of similar systems (i.e. domain) 
are collected and generalized which are then analyzed and modeled. The 
domain requirement as a core asset explicitly manages the commonality and 
variability. Through this method, the reuse of domain requirements can be 
enhanced. 



1 Introduction 

The goal of product line engineering is to support the systematic development of a set 
of similar software systems by understanding and controlling their common and 
distinguishing characteristics [2]. The methodologies of domain engineering have 
been used to develop core assets that can be reused in product line engineering, such 
as requirements, architecture, reusable components, test plans, and other elements 
through domain analysis, domain design and the domain implementation processes. 
To date, much of product line engineering research has focused on the reuse of work 
products relating to the software's architecture, detail design, and code [3]. 

Commonality and variability play central roles in all product line development 
processes. These factors must be considered during the requirement engineering phase 
for successful reuse. Indeed, the variabilities, which are identified at each phase of the 
core assets development, are diverse in the level of abstraction. In the past, 
variabilities have been handled in an implicit manner and without distinguishing each 
core asset’s characteristic. Additionally, the previous approaches have depended upon 
the experience and intuition of a domain expert when recognizing commonality and 
variability. Therefore, it is necessary to define, systematically identify, and explicitly 
represent the variations that can be extracted at the requirement level. Obviously, it 
has to be demonstrated that the decision on properties of the requirements as common 
or optional is rational, because it has an impact on the property of subsequent core 
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assets. Furthermore, the minimum action that should be provided for variables is the 
identification of the appropriate variation points. 

In this paper, we suggest a method that systematically develops requirements as a 
core asset to enhance reusability in the context of product line approaches. Domain 
requirements are collected and generalized, then analyzed and modeled while 
explicitly managing the commonalities and variabilities. Furthermore, a domain 
requirement meta-model is presented to incorporate the notion of commonality and 
variability at the requirement level. Various matrices are introduced to ensure 
objectivity when identifying commonalities and variabilities. We make these 
processes concrete by defining specification atoms (Primitive Requirement: PR) and 
composition rules. The variables are considered separately on two levels: one to 
detennine the properties and the other to prescribe the variation-point type. This 
information is used to build reusable requirement models that can be applied and 
refined in the development of new applications that belong to the same domain type. 



2 Component-Based Domain Engineering 

The general process of product lines is based on the reusability of core assets - 
requirements, architecture, and components. In particular, the development of product 
lines containing the component paradigm appearing in Figure 1 consists of two main 
phases [4]. 
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Fig. 1 . Component-Based Product-Line Process 



The first phase is called Component-Based Software Engineering (CBSE). This 
phase is comprised of seven major activities, starting with context comprehension and 
requirement analysis, continuing with the combination of componential design and 
component identification, component creation, component adaptation, and finally 
ending with component assembly. The second phase is called Component-Based 
Domain Engineering. In the domain definition step of domain engineering, the 
purpose of the domain is decided, and its scope is confirmed. In the domain modeling 
step, a domain model is obtained by analyzing the domain. Domain analysis must 
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identify stable and variable parts of the domain. Based on the domain model, the 
domain components are identified and the domain architecture is created. The 
artifacts of each step are maintained with interrelationships and stored in a core assets 
repository. They are then reused as valuable core assets during CBSE. In particular, 
this paper explains how to develop requirement as a core asset in detail. 



3 Domain Requirement Definitions 

The notion of a domain is still quite vague in the research community. To ensure 
correct usage of “domain requirement”, the term “domain”, as used in this paper, 
should be defined. In this paper we follow the application-oriented definition of 
domain given by [5] and [6] as - “a family or set of systems including common 
functionality in a specified area.” A domain requirement is defined as one common 
requirement that can be reused as a core asset of developing systems in product line. 
That is, domain requirements describe the requirements of common skeletons in the 
systems, then describes somewhat abstracted types of slightly different parts in the 
systems. Figure 2 depicts the meta-model of the domain requirement. The purpose of 
the meta-model is to lay down an overall scheme for representing domain 
requirements. It defines several requirement types and variables for each requirement. 
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Fig. 2. Domain Requirement Meta-Model 



In this paper, we define the new term ‘ Primitive Requirement: PR. ’ According to 
Wierzbicka, "any semantically complex word can be explicated by means of an exact 
paraphrase composed of simpler, more intelligible words than the original." [7]. 
These simpler and intelligible words are known as semantic primitives. The meanings 
of requirements can be decomposed into sets of semantic primitives. The divided 
semantic primitive is defined as the primitive requirement: PR. That is, the term ‘PR’ 
is used as a building block of a more complex refinement. 

The meta-model has a domain requirement as the central model element. Domain 
requirements hierarchically are divided into two categories: functional and non- 
functional requirements. Functional requirements consist of PRs. PRs define the kind 
of functionality a system should have. A PR is refined to PRelements which are 
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composed of behavior PRelements and static PRelements. Domain requirements 
manage variables. These are managed at different levels of detail. At the PR level, the 
region of common or optional property can be identified according to the frequency it 
appears in the domain. At the lower level, the PRelement level, the variation points of 
PR which have common or optional properties are recognized. Non- functional 
requirements tend to be related to one or more functional requirements [8], In 
addition, non-functional requirements exist simultaneously with other qualities, and 
can be influenced positively or negatively. So, any quality item that occurs in the non- 
functional requirements should have a relationship to functional requirements and 
other quality items. Additionally, domain requirements may have relationships among 
themselves. This information about domain requirements is explicitly represented in 
domain models. 



4 Domain Requirement Development Process 



The domain requirement development process presented in this study is shown in 
Figure 3. 
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Fig. 3. Domain Requirement Development Process 



4.1 Collecting and Generalizing Domain Requirement 

Domain requirement generalization is a process of categorizing the properties of 
requirements and converting generic requirements. The property of the domain 
requirement should be objectively classified. An objective classification can be 
accomplished by collecting and assessing knowledge of past similar solutions - the 
requirements of legacy systems. A PR-Context matrix is constructed for this purpose. 
This matrix provides a basic plan of how to repeatedly extract such commonalities 
and variabilities. 
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4.1.1 Domain Terminology 

The basic concepts and terms which are used in a domain should be defined. Terms 
which are used in external stakeholders as well as in development processes are 
described in domain terminology. Domain terminology contains descriptions, 
examples, and related terms. Domain terminology generalizes similar terms with the 
same meaning in a domain to an abstracted term, thus reducing the size and 
complexity of terms. Term generalization is an essential prerequisite step to 
generalize a requirement ‘sentence’ which is executed in proceeding steps. 



4.1.2 PR-Context Matrix 

A PR-Context matrix is displayed in Figure 4. All PRs of legacy systems are merged 
into this matrix after checking the existence of other legacy system's PRs. If a system 
has a PR it is marked ‘O,’ if not, it is marked ‘X’ where the column and row within 
the matrix meet. This matrix indicates statistically what property the PR may have 
without depending on the experience and intuition of a domain expert. 
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Fig. 4. PR-Context Matrix 

The first of generalization process is applied to systems in the matrix (© in Figure 
4). Legacy systems composed of the same PRs can be generalized down to a single 
named “context.” Through this process, many legacy systems in the domain are 
arranged and the number of columns in the matrix is reduced. The second of 
generalization process is applied to PRs in the matrix (u in Figure 4). If a PR appears 
in most contexts or systems shown in Figure 4-©, it is recognized as a common PR 
[C] . If a PR in Figure 4-© appears selectively in many contexts or systems but does 
not have another substitutive PR, it is recognized as an optional PR [P], As depicted 
in Figure 4-©, if substitutive requirements are detected (PR y,, PR y 2 , PR y n ), they 
can be generalized down to a generic requirement (PR y g ). The PR can be categorized 
as a CV which is a common requirement and a PV which is an optional requirement 
according to the summed ‘O’s and 'X’s. Through this process, many PRs are arranged 
and rows within the matrix are reduced. 



4.2 Analyzing and Modeling Domain Requirement 

Domain requirement analysis is a process of refining requirements and identifying 
variation points. The analyzed requirements are graphically represented using existing 
modeling methods. Figure 5 shows the concepts and various artifacts involved in the 
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domain requirement model which are represented with class diagram and package 
diagram, respectively. 
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concepts and artifacts involved in domain requirement model 



4.2.1 Domain Usecase Model 

The PRs are modeled as Domain Usecase model. This is modeled using a usecase 
modeling technique. A usecase model is a model of the system’s functionality and 
environment [9]. In this paper, a domain usecase model is defined as a usecase model 
with properties. A domain usecase model consists of a domain actor model and 
domain usecase diagrams. This model explicitly represents properties by using an 
extension of the UML. Table 1 show the UML extensions which will be used in the 
domain usecase model. 



Table 1. UML extensions in Domain Usecase Model 
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Domain Actor Model. A domain actor model is constructed by extracting actors and 
relationships between the actors. Users, the external system, and sensors and 
actuators, which participated in interactions with the external environment, can all be 
actors. The domain actor model represents environmental variability in the actor’s 
property and description. 



Domain Usecase Diagram. Usecases should be functionally cohesive [10]. Each 
usecase should fulfill all related business or domain responsibilities involved in the 
primary actor's goal. It creates a domain usecase diagram that summarizes the 
behavioral context of the domain in terms of its actors, its usecases, and the 
relationships between them. The domain usecase diagram should be modified to 
reflect the properties of domain usecase by utilizing PR-Usecase Matrix (Figure 6). 



PR-Usecase Matrix. A PR-Usecase matrix is created to recognize the property of each 
usecase by referring to the domain usecase diagram and the PR-Context matrix. The 
usecase name, PR, and the property of PR are displayed in the matrix. The PRs that 
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are contained in each usecase are analyzed. Properties of the domain usecase are 
influenced by the PR's properties. Figure 6 represents the PR-Usecase matrix. 
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Fig. 6. PR-Usecase Matrix 

When analyzing the PR-Usecase matrix, usecase conditions such as the following are 
considered. Usecases are divided and rearranged as necessary. 

(1) PRs are spread over to many usecases (CD in Figure 6). In this case, the overlapped 
PRs are separated, made into an independent usecase, and connected to «include» 
relationship. (2) A usecase includes optional PRs (( 2 ) in Figure 6). In this case, the 
optional PRs are separated and connected to «extend» relationship. 

The property of the usecase that was reorganized is classified as follows: 

Common Usecase : When a usecase has common PRs (C or CV), it is classified as a 
common usecase and represents an important process in the system. (• in Figure 6). 
Optional Usecase : This corresponds to a usecase composed of optional PRs (P or 
PV). It represents a usecase that does not always need to exist when handling a 
process in the system. ( in Figure 6). 

4.2.2 PR Specification 

Each PR consists of two types of PRelements: BehaviorPRelement (a perspective 
along a timeline) and StaticPRelement (a perspective along static structure). These are 
analyzed in PR specification. At the PRelement level, the variation points are 
recognized. In [11], a variation point is defined as follows: “a variation point 
identifies one or more locations at which the variation will occur.” In this paper, 
variation points which are identified at the requirement level are divided into four 
types - data, system interface, computation, and control. As Figure 7 indicates, 
variabilities in computation and control can be identified in BehaviorPRelements, and 
variabilities in data and system interface can be identified in StaticPRelements. 

• Computation - a particular function may exist in some systems and not in others. 
This is a variability which may occur in a process itself. For example, a process 
includes business rules or laws, or manipulates an external service. 

• Control - a particular pattern of interaction may vary from one system to another. 
This may occur at a branch point of control flow. 

• Data - a particular data structure may vary from one system to another. This is 
mainly related to variability of input data or output data. 
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• System Interface - an external interface may vary from one system to another. This 
is influenced by data variability and the external operator. 




Table 2 shows a PR specification. The behavior PRelements and static 
PRelements, which include variation points, are described in the second column. The 
variants for the variation point are described in the third column. The partial example 
of a functional requirement: Register is described in italics in this table. 



Table 2. PR Specification for Functional Requirement 
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PRlx. [d] means data variability. 
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serData 




PRlx. [i| means a variability of system 
interface. 


[System Interface] 



Table 3 shows a specification for a non-functional requirement. 



An Approach to Develop Requirement as a Core Asset in Product Line 3 1 



Table 3. Specification for Non-functional Requirement 



Non-Functional Requirement 


Quality Items 


Priority 


Is Related to PR 


Is effected by NFR 


Non-functional 
requirements are 
refined into 

several quality 
items. 


Quality items are 
prioritized as 

follows: High / 

medium / low 


Each Quality Item is related 
to one or more PR. 


+: has a positive effects 




has a negative effects 




±: has no connection with 
each other NFR 



4.2.3 PR Analysis Model 

The PRelements are modeled as a PR analysis model. It is modeled using three types 
of analysis classes: boundary class, entity class, and control class [12], In our 
approach, these classes are used at a very high level as a tool for identifying a set of 
objects that will participate in the usecase, and allocate the variation points into these 
objects. A boundary domain class is used to model interaction between the domain 
and its actor, and also to clarify the requirements on the domain’s boundaries. An 
entity domain class is used to model long-lived information, such as that associated 
with logical data structures. A control domain class is used to represent abstractions 
of application logic. Table 4 shows the UML extensions which will be used in the PR 
analysis model. 



Table 4. UML extensions in PR Analysis Model 



Concept in DR Model 


UML Construct 


Stereotype 


Computation VP 


Control Class 


«v.p:C» 


External Computation VP 


Control Class 


«v.p:EC» 


Control VP 


Control Class 


«v.p:Ctl» 


Data VP 


Entity Class 


«v.p:D» 


System Interface VP 


Boundary Class 


«v.p:I» 



4.2.4 Domain Constraints Specification 

The domain requirements are not independent of one another. In order to enhance the 
reusability of requirements, the decomposed requirements need to be composed. The 
rules of composition can be made by identifying the relationships between domain 
requirements. The relationships between domain requirements are classified into six 
types as follows: 



Table 5. Domain Requirement Relationships 



Functional Requirement vs. Functional Requirement 


Depend-On 
(PR! V PR 2 ) 


A PR “needs” or “requires” other PRs. For example, if a PR uses data 
from another PR, or if a PR can be executed only after another PR is 
finished, the relationship between the PRs is a “depend-on” 
relationship. 


Generalization 
(PR g 9 PR S ) 


A PR can be realized in multiple way. In the PR-Context Matrix, if 
substitutive PRs are detected (PR y b PR y 2 , PR yj and they could be 
converted to a generic PR (PR y s ), the relationship between PR y K and 
PR y! is a “generalization” relationship. 


Alternative 

(PR, 1 PR?) 


A PR can be replaced by other PRs. In the PR-Context Matrix, if 
substitutive PRs are detected (PR yi, PR y 2 , PR y n ), then the 
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relationship between PR y u PR y 2 , and PR y n is an “alternative” 
relationship. 


Refinement 
(PRp ,PRc) 


A PR can be refined to a further level of detail. That is, the PR is 
extended in the hierarchy further and to the level of detail the system 
needs. In the PR specification, the relation between the PR and 
PRelement is a “refinement” relationship. 


Functional Requirement vs. Non-functional Requirement 


Affected 
(PRnfrl © 

PRfi) 


Non-functional requirements tend to be related to one or more 
functional requirements. In a PR specification, this means the 
relationship between a PR and a quality item. 


Non-functional Requirement vs. Non-functional Requirement 


Conflicting 
(PR„frl7 Rnfrc) 


Non-functional requirements exist simultaneously with other qualities 
and are influenced either positively or negatively. In the PR 
specification, the relationship is between quality items. 



The domain requirement relationships are described in domain constraints 
specification with symbols which are defined in Table 5. 



5 Related Works 

Requirement Engineering for product families. The Requirement engineering 
process is often described as cyclic, with each cycle consisting of elicitation, analysis, 
specification, validation, and management activities [ 1 3] [ 14] . These activities must be 
reconsidered in the context of product line. In recent years, several studies have 
occurred on requirement reuse. Kuusela and Savolainen describe a definition 
hierarchy method for requirements capturing and structuring [15]. Thompson and 
Heimdahl focuse on structuring product family requirements for n-dimensional and 
hierarchical product lines [16]. These methods focus on the structure of the domains 
themselves to identify architectural drivers and to guide the later creation of the 
architecture. Product Line Analysis (PLA) combines traditional object-based analysis 
with FODA (Feature-Oriented Domain Analysis) modeling and stakeholder- view 
modeling to elicit, analyze, specify, and verify the requirements for a product line 
[17]. The minimal support that should be provided for variabilities is the identification 
of the appropriate variation points. Nevertheless, in all methods a requirement model 
is missing which incorporates the notion of commonalities and variabilities with the 
possibility of accessing other development elements. 

Identifying the commonalities and variabilities in Product-Line Engineering. 

Domain analysis techniques can be used to identify and document the commonalities 
and variabilities in related systems in a domain. The Software Engineering Institute’s 
FODA is a domain analysis method based upon identifying the features of a class of 
systems [18]. FeatuRSEB is an extension of the RSEB with an explicit domain 
analysis phase based on the FODA model [19]. ODM (Organization Domain 
Modeling) was developed to provide a formal, manageable, and repeatable approach 
to domain engineering. The ODM method does not directly address the ongoing 
management of domain models and assets [5], SCV (Scope, Commonality, and 
Variability) analysis from Lucent Technologies identifies, formalizes, and documents 
commonalities and variabilities [20]. The result of this analysis is a text document. It 
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is clear that the variable elements which can be identified during each phase of the 
development of core assets should be specified at different levels of detail. 
Nevertheless, large numbers of domain analysis methods handled these variabilities 
elements implicitly without distinguishing the core assets at the same level. 
Additionally, the existing domain analysis approaches depended on experience and 
intuition by a domain expert in order to identify commonalities and variabilities 
which are the most important features in domain analysis. Namely, there are no rules 
that enable engineers to identify domain elements easily. 

Assessments. We compared our method to others methods. The assessment of the 
proposed method is depicted in Table 6. 



Table 6. Comparison between related works 



Comparison Points 


[3] 


[15] 


[16] 


PLA 

[17] 


Proposed 

Method 


Identify the commonality and variability' 


Yes 


Yes 


Yes 


Yes 


Yes 


Demonstrate that the decision of commonality is 
rational 


No 


No 


No 


No 


Yes 


Identity variation points at requirement level 


No 


No 


No 


No 


Yes 


Classify variation point type 


No 


No 


No 


No 


Yes 


Establish relations between requirements 


No 


Yes 


Yes 


No 


Yes 


Explicitly' represent the properties in requirement 
model 


Yes 


No 


No 


No 


Yes 


Have a systematic process to develop domain 
requirement 


Yes 


No 


No 


Yes 


Yes 


Demonstrate the applicability' of the method 


Yes 


Yes 


Yes 


Yes 


Yes 



6 Conclusions 

Product line engineering and requirement engineering processes, respectively, are 
essential to improve the speed that high-quality products are developed and to reduce 
costs. However, what seems to be lacking is to combine the concepts of product line 
and requirement engineering. The requirements of many application systems in a 
product line may be a reusable core asset, such as architecture, components, and test 
plans. Therefore, in this paper, we have proposed a method to develop requirements 
as a core asset in a product line. In closing, requirements for families of similar 
systems (i.e. domain) have been collected and generalized which were then analyzed 
and modeled while explicitly managing the commonalities and variabilities. We 
attempted to demonstrate that the decision concerning the regions of commonality is 
rational by using variable matrices. Additionally, points of variation that could be 
identified at the requirement level were classified and explicitly represented in the 
requirement model. 

Our future research activities include the development of domain architecture, 
especially with respect to variability management. In particular, variation points at the 
architecture level must be derived from variation points at the requirement level and 
then refined. This would contribute to a more efficient management of core assets in 
product lines. 
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Abstract. Software product-line engineering aims at improving the efficiency 
and effectiveness of software development by exploiting the product line mem- 
bers’ commonalities and by controlling their variabilities. The duality of com- 
monalities and variabilities holds for all kinds of assets ranging from require- 
ments specifications over design documents to test cases. A decision model 
controls the way a product can be distinguished from the rest of the family and 
is used to extract product-specific information (e.g., product requirements) 
from the family specifications. Though we traditionally employ decision mod- 
els for generating code, we aim on capitalizing on the investment for designing 
the decision model by leveraging it to generate test cases. In this paper we fo- 
cus on acceptance testing of functions and features, and introduce our approach 
of using the decision model concept to maintain and generate acceptance test 
cases for one of our major product lines. Preliminary evaluation of this method 
demonstrates very promising savings of space and effort as compared to con- 
ventional methods. 



1 Introduction 

Software product-line engineering is a method for developing families of systems as 
opposed to the traditional way of developing single systems at once. The goal is to 
improve the efficiency and effectiveness of software development by exploiting the 
family members’ commonalities and by controlling their variabilities [1] [2] [4] [15]. 
Exploring the commonalities of the product-line members allows us to avoid redevel- 
oping common aspects of our software over and over again. Predicting the variabil- 
ities of the product-line members enables us to prepare and design our software for 
the necessary changes [5]. This duality of commonalities and variabilities holds for all 
kinds of assets ranging from requirements specifications over design documents to 
test cases. 

A product-line approach allows for generating the product-line members’ code, 
from a common product-line infrastructure instead of having to develop them inde- 
pendently. One of the first steps of Avaya’s product-line process is to conduct a 
commonality analysis that identifies the common aspects as well as the parameters of 
variation for the family. Based on this information we can then derive a decision 

J. Bosch and C. Krueger (Eds.): ICSR 2004, LNCS 3107, pp. 35-48, 2004. 
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model that guides the extraction of information that is specific for one family member 
[15]. Though a decision model’s main purpose is to generate code, we aim on capi- 
talizing on the investment we make for conducting the commonality analysis and 
designing the decision model by leveraging it to generate other assets such as product 
documentation or test cases. 

There are three main levels of testing: unit testing, integration testing, and system 
testing. In unit testing each developer tests his or her components before integrating it 
with the rest of the code. This is usually a white-box approach. In integration testing 
we test the interaction between the components and make sure that they follow the 
interface specifications and work together properly. Integration testing can be a com- 
bination of white-box and black-box testing. System testing tests the features and 
functions of an entire product and validates that the system works the way the user 
expects. It is usually a black-box approach. While for unit and integration testing we 
need source code, system testing can be done independently from source code. 

In this paper we focus on acceptance tests, which are black-box system tests that 
validate the customer facing features of the products. As such they are performed on 
the product deliverables. If possible, acceptance tests are defined before the feature is 
implemented. Acceptance test plans are textual descriptions of the functional tests that 
need to be executed. We typically employ a descriptive input-output test oracle and 
describe a test case in terms of actions to be performed on the system and expected 
system behavior. The latter is based on the product specification. Traditionally we 
manually create our acceptance test cases and store them in a Word document with 
tables. The specification of which test cases belong to which product and/or to which 
feature is managed by arranging the tables in separate documents and sections. Each 
table has three columns: step name, step description, and expected results. 

Figure 1 shows three sample test cases for an application called AV 1 that is part of 
an airline help system and that we will use as running example. A product can offer 
several applications and an application can have several tools. For AV we assume the 
existence of a phrase manager, a tool for adding entries to the phrase database, which 
stores common phrases that can be used by the airline agents. We also assume the 
existence of a prompts manager, a tool for setting filters for displaying the phrases. 
Though, we test different tools with slightly different behavior (one pops up a dialog 
window, the other doesn’t) and for different languages, we can see that the sample 
test cases have many commonalties. 

Even though the presented examples are simplified, they reflect the actual situation 
of our current test suite. We can consider the set of test cases as a family itself. In 
general, we have much redundancy among the test cases of similar products. Some 
redundancy even exists within the tests of the same product. Also due to the lacking 
of modularization, a group of steps can be repeated many times in the document, 
resulting in very large testing documents that are hard to understand and maintain. 



1 Note that test cases as well as names of applications, tools etc. are examples created for the 
purpose of this paper and do not reflect actual Avaya terms or products. However, the struc- 
tural properties of illustrated test cases are very similar to what is used for real products. 
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Test Case 1 - Korean 



Step Name 


Description 


Expected Results 


Step 1 


• Start AV application 

• Open prompts manager 

• Right click in the prompts manager’s text window 
and enable the IWR to accept Korean characters 

• Enter Korean text 


• Verify that IWR is 
enabled and Korean 
characters can be entered 
and are displayed correctly 


Step 2 


• Click on the existing OK button in the prompts 
manager 


• Verify that the dialog 
window that pops up 
displays the Korean 
characters correctly 



Test Case 2 - Japanese 



Step Name 


Description 


Expected Results 


Step 1 


• Start AV application 

• Open phrase manager 

• Right click in the phrase manager's text window 
and enable the IWR to accept Japanese characters 

• Enter Japanese text 


• Verify that IWR is 
enabled and Japanese 
characters can be entered 
and are displayed correctly 



Test Case 2 - Korean 



Step Name 


Description 


Expected Results 


Step 1 


• Start AV application 

• Open phrase manager 

• Right click in the phrase manager’s text window 
and enable the IWR to accept Korean characters 

• Enter Korean text 


• Verify that IWR is 
enabled and Korean 
characters can be entered 
and are displayed correctly 



Fig. 1 . Three Sample Test Cases 



The remainder of the paper presents a product-line view on a set of legacy accep- 
tance tests and is organized as follows. Section 2 introduces our decision model based 
testing approach that we developed to support the testing process of one of our major 
product lines. Section 3 gives an overview of a first application of the approach. We 
discuss related work in Section 4 and conclude with a summary and outlook in Sec- 
tion 5. 



2 A Family of Test Cases 

Behind a product family there is always a family of test cases. The question is how to 
exploit the reuse potential given by the commonalities and variabilities of our test 
case family and how to generate those members of our test case family that are rele- 
vant for a given product. First we define a set of generic test cases by suitable param- 
eterization. In common product-line terminology this means we create the core assets 
of our test case family. As we are not developing a product line from scratch, but 
rather building upon a set of legacy systems, we start by analyzing existing test cases 
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and generalize them for creating the family’s assets. For generating a test suite, we 
then select generic test cases that fit the product under consideration and instantiate 
them accordingly. The decision model is an integral part of our selection and instan- 
tiation algorithms. 



2.1 Test Case Generalization 

We assume that test cases are described as a sequence of test steps, each consisting of 
a list of description items and an expected result item (Figure 1). Description items 
and expected results are building blocks from which we can assemble actual test steps 
and test cases. Some building blocks are identical across test cases. For instance, item 
“Start AV application” from Figure 1 is copied in all three test cases. For further 
exploiting commonalities between test cases, we parameterize building blocks, so that 
they become reusable over a larger set of test cases. Parameterization of building 
blocks is driven by the parameters of variation from the commonality analysis 2 . 

For our example we assume three parameters of variation: (1) parameter A e { AV, 
BV, CV} with a set of integrated applications as possible values, (2) parameter L e 
{Korean, Japanese} with a set of supported languages as possible values, and (3) 
parameter M e {phrase manager, prompts manager) with a set of integrated tools as 
possible values. Note that parameters A, L, and M have already been identified by the 
commonality analysis of our product line and do not result from a separate common- 
ality analysis of an existing test case suite. We capitalize on that investment when 
creating the core assets for our test case family. The idea is to take an existing test 
case and systematically search its description and result items for values of the pa- 
rameters of variation. Having found an item with such a value, the chances are high 
that we will find that very same item in other test cases, but with a different value 
from the range of the corresponding parameter of variation. 

Let us look at an example. We can, for instance, use parameter of variation A for 
generalizing item “Start AV application”, since AV is a possible value of A. We then 
turn the concrete description item “Start AV application” into the generic description 
item Iteml(A)=”Start (A) application”. Once an item is parameterized this way, we 
validate if all possible instantiations occur in the existing test suite. If this is not the 
case, we must analyze if the test suite needs to be extended (i.e., we found an indica- 
tion of missing test cases) or if the instantiation of the parameters is restricted to spe- 
cific parameter settings. In the latter case we annotate a constraint that shows which 
instantiations are permissible. In case of Iteml(A) we know that the other two appli- 
cations BV and CV also need to be started, so there is no constraint necessary. How- 
ever, if the description items were more specific stating that AV and BV must be 
started via a pull down menu, while CV is started by double-clicking an icon on the 
desktop, we would need to specify constraints to make sure that we do only instanti- 
ate the generic item with value AV and BV, but not with value CV. 



2 The FAST commonality analysis introduces a parameter of variation for each identified 
variability and specifies the range of possible values. 
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In general, we can use following procedure for test case generalization: 

(1) Select a test case from the existing test case suite and extract its concrete descrip- 
tion and result items. 

(2) Scan each concrete item for values that are also defined for a parameter of varia- 
tion from the commonality analysis. We call each item that contains such a value 
“candidate for parameterization”. 

(3) Generalize each candidate for parameterization by replacing the identified 
value(s) by the corresponding parameter of variation. Now, it may be possible 
that not all instantiations by elements from the value range of the corresponding 
parameter of variation are valid, i.e., a specific instantiation may not be used in 
an existing test case or cannot even potentially be used for a complementary test 
case. In that case, we need to annotate the generic item with a constraint defining 
the permissible value assignments. 

(4) Replace the original test case from the existing test case suite by its generalized 
version. Check if other existing test cases are instantiations of the generic test 
case and remove them from the set. 

(5) Also scan for instantiations of the newly identified generic description and result 
items in the remaining existing test cases and replace them by their generic ver- 
sions. That means we partially generalize the remaining test cases. 

We follow this process until all test cases are replaced by a generic version, or we do 
only have concrete existing test cases left that are common across the product family 3 . 
When developing new test cases, we check which of the building blocks can be used 
for defining them. We also check, if a new test case is useful for other products and 
generalize it accordingly. Note that the set of generalized test cases may cover more 
concrete test cases than the existing test suite, if the latter turned out to be incomplete. 
This is an intended effect that comes with systematically exploring the test case space, 
driven by the results of the commonality analysis 4 . 

Figure 2 illustrates the results of test case generalization for the sample test cases 
introduced in Figure 1. As can be seen, the three test cases have many (parameter- 
ized) items in common. In total, we identified five description items. All of them are 
parameterized and can now be reused across test cases. There are two result items that 
are both parameterized. Item Result2 is parameterized to cover a complementary test 
case (TestCasel-Iapanese) that was not part of the original test suite of Figure 1. In 
total, we have used three parameters of variation from the commonality analysis for 
generalizing the building blocks. 

The presented example is simplified, but reflects the structure of test suites we 
have analyzed. Parameterization of building blocks in the above mentioned way re- 
duces maintenance effort extremely. Based on our experience, the number of descrip- 
tion items can be reduced on average by approximately 70% and the number of ex- 
pected result items by 60%. 

3 We make a critical assumption here, namely that description and result items come in a 
certain canonical format. See Section 6 for more information on future work on canonical 
forms. 

4 Note that optimization of test case suites is not within the scope of this paper. We are look- 
ing into test case optimization as future work. 
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Test Case 1 - Korean 



Step Name 


Description 


Expected Results 


Step 1 


• Item 1 (A = AV) 

• Item 2 (M = prompts manager) 

• Item 3 (M = prompts manager, L=Korean) 

• Item 4 (L=Korean) 


• Result 1 (L=Korean) 


Step 2 


• Item 5 (M = prompts manager) 


• Result 2 (L=Korean) 



Test Case 2 - Japanese 



Step Name 


Description 


Expected Results 


Step 1 


• Item 1 (A = AV) 

• Item 2 (M = phrase manager) 

• Item 3 (M = phrase manager, L=Japanese) 

• Item 4 (L=Japanese) 


• Result 1 (L=Japanese) 



Test Case 2 - Korean 



Step Name 


Description 


Expected Results 


Step 1 


• Item 1 (A = AV) 

• Item 2 (M = phrase manager) 

• Item 3 (M = phrase manager, L=Korean) 

• Item 4 (L=Korean) 


• Result 1 (L=Korean) 



Parameters 

A e {AVj 

L e {Korean, Japanese} 

M e {prompts manager, phrase manager} 

Generic Building Blocks 

Description Items /: 

Item 1 (A): “Start (A) application” / (no constraints) 

Item 2 (M): “Open (M)” / (no constraints) 



Item 5 (M): “Click on the existing OK button in the (M).” / (M=prompts manager) 
Expected Results/Outputs Item O: 

Result 1 (L): “Verify that IWR is enabled and (L) characters can be entered and are 
displayed correctly.” / (no constraints) 

Result 2 (L): “Verify that the dialog window displays the (L) characters correctly.” 

/ (no constraints) 



Generic Test Cases 

Stepl (A, L, M) 
Step2 (L) 
TestCasel (L) 
TestCase2 (L) 



= ( Item 1(A); Item2(M); Item3(L, M); Item4(L); Resultl(L) ) 
= ( Item5 (M); Result2(L) ) 

= ( Stepl(A,L,M); Step2(L) ) 

= ( Stepl (A,L,M) ) 



Fig. 2. Generalized Test Cases 



Towards Generating Acceptance Tests for Product Lines 



41 



Generalizing a test suite results in a set of generic test cases which reference generic 
description and result items including their constraints. We may also have test cases 
that cannot be generalized and remain in their original form. Following we discuss 
how to generate a concrete test suite from the set of generic test cases. 



2.2 Decision-Model Based Test Case Generation 

Instantiating the set of generic test cases with all possible values from the parameters 
of variation would typically result in an invalid set of test cases. We need a way to 
implement the constraints that are associated with the generic description and result 
items. 

Constraint Implementation 

We leverage the decision model for that purpose by attaching generic test cases to all 
leaf nodes of the decision tree that represent valid parameter assignments for the 
generic test case. For our running example, Figure 3 shows the decision tree, which 
has already generic test cases attached to its leaf nodes. Since each path in the deci- 
sion tree represents a value assignment for the parameters of variation, we expect that 
those assignments are also valid for the attached generic test cases. A value assign- 
ment is valid for a generic test case, iff all the constraints from the generic description 
and result items that are referenced by the generic test case do hold for the given 
value assignment. For space reasons, we do not further expand on the algorithm for 
attaching generic test cases to the decision tree, but it is straightforward. The point is 
that if we assign each generic test case to all the leaf nodes that represent valid value 
assignments for the generic test case, we have solved the constraint problem. In order 
to generate a concrete test case suite for a specific product, we follow the decision 
tree, which will bind the product family’s parameters of variation, pick up the generic 
test cases attached to the leaf nodes that we visit, and instantiate them with the col- 
lected value assignment. 

For instance, following the leftmost path in the decision tree of Figure 3 would se- 
lect a family member that integrates application AV, supports the Korean language, 
and integrates the phrase manager also. We can easily verify that the generic test case 
that is attached to the leftmost leaf node is valid for the chosen value assignment of 
A=AV, L=Korean, and M=phrase manger. The decision model would allow us to add 
support for the Japanese language, too. This would be done by backtracking to the 
language node and also follow the edge labeled “Japanese”. We would then pick up 
the same generic test case again, but now at the rightmost leaf node. However, the 
extended value assignment A=AV, L={Korean, Japanese}, and M=phrase manger 
now gives us two instantiations of that generic test case. 
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Fig. 3. Decision Tree 



Test Generation Algorithm 

This section discusses in more detail how we generate a concrete test case suite for a 
specific member of our product family. For that we need a more precise definition of 
our notion of decision tree. 

A decision tree can be defined as a tuple D = (N, E, PV, V, n, e), where N denotes 
a set of nodes (called decision points), E c N x N denotes a set of edges, so that E 
describes a tree, PV denotes the set of parameters of variations from the commonality 
analysis, V denotes the universe of possible values for the parameters of variations, n 
denotes a labeling function ri: N —> 2 PV , and e denotes a labeling function e: E -4 
(PV— > V). The labeling function n assigns a set of parameters of variation to each 
node in the decision tree, so that each parameter of variation is assigned to at most 
one node of each (complete) path in the tree. The parameters of a complete path are 
sufficient to characterize a product, but we do not need all parameters for variation 
for each product characterization. That is, if n^..., n m e N is a complete path in I) 
then <.< m kJ n(n.) is a valid product characterization and there exists no p h g PV with 
p h g n(n.) and p h g n( n k ) and 1< j. k < m. For each edge in the tree, the labeling func- 
tion e gives a value assignment for the parameters of variation that are listed with the 
parent node. That is, e((a, n k ))(p 1 ) is defined, iff p, g n(n.), n. g N, n k g N , (a, n k ) g E, 
and Pj g PV. 

Each product line member is characterized by its value assignment for the pa- 
rameters of variation, which we obtain by navigating through the decision tree. Our 
generation method traverses the decision tree in a depth first manner to collect both 
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test cases and associated parameter assignments. At the decision points the user is 
prompted for his or her decision of parameter assignments 5 . We call the set of pa- 
rameter assignments that we collect while traversing the tree PA. To give an example, 
if we are visiting decision point “Manager 1” (Figure 3) and the user decides for 
prompts manager, we would have the following parameter assignment (assuming that 
the user has earlier decided for language Korean): 

PA = PA u {M=prompts-manager} = {A=AV, L=Korean, M=prompts-manager } 

The test generation algorithm works as follows (Figure 4) 6 . We employ a search 
engine to search through the decision tree and prompt the user for input. If a parame- 
ter of variation allows multiple assignments, more than one branch can be selected. 
For instance, consider the decision node “Language” in Figure 3. Possible user 
choices are {Korean}, {Japanese}, or even {Korean, Japanese}. If he either selects 
Korean or Japanese, we would only select the left or right path, respectively. Other- 
wise we would include both paths. 

Initially, PA is set to be empty. We also create an empty set of test cases. The 
search engine pointer set includes initially only one element, namely the root of the 
decision tree combined with the empty set PA. During the search engine’s traversal of 
the decision tree, we add the selected decision points together with the corresponding 
parameter settings that we get from the user to the pointer set. PA changes while we 
traverse up and down through the tree and always represents the current parameter 



Test-Generation (D) 

{ 

(N, E, PV, V, n, e) = D; 

Test-Case-Set = {}; 

Pointer-Set = { [root (E) , { } ] } ; 

PA = {}; 

while (Pointer-Set 0) { 

p = pick-an-element (Pointer-Set) ; 

[node-x, x-PA] = p; 

PA = x-PA; 

foreach ( [node-x, y] e e) { 

<vO;vl;...;vn> = prompt-parameter-assignment (node-x) ; 
PA += <v0 ; vl ; ...; vn> ; 
if isLeave(y) { 

T = instantiate-generic-tests (y, PA); 
addToSet (Test-Case-Set, T) ; 

} else 

addToSet (Pointer-Set, [y, PA]); 

} 

} 

return Test-Case-Set; 

} 

Fig. 4. Test Generation Algorithm 



5 We plan to include parameter assignments of previous generations from which the user can 
select. As we are leveraging the decision model tree developed for other phases of our prod- 

uct line approach, we expect to leverage the tool support offered for that. 
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settings. When the search reaches leaf level, the attached generic test cases will be 
instantiated with the current parameter assignment PA and added to the test case set. 
When the search engine pointer set is empty, the tree search ends and all test cases are 
generated. 

Table 1 illustrates the application of the generation algorithm on our running ex- 
ample for the user providing following input: first he decides for A=AV, then 
L=Korean, then M=phrase manager, and finally L=Japanese. The table shows the 
settings of the pointer list while we walk through the algorithm step by step. The 
current pointer always points to the top element of the pointer list, which consists of 
the current node and parameter settings. Applying relation E results in an updated 
pointer list. When we reach leaf level we instantiate the attached generic test cases 
with the current parameter settings and add them to the test case set. The table serves 
to illustrate that applying the generation algorithm to the decision tree of Figure 3 
generates those test cases of Figure 1 that actually deal with the phrase manager of the 
AV application. 



Table 1 . Sample Test Case Generation for decision tree of Figure 3 



Current Pointer p 


New Pointer-List 


Test-Case-Set (addi- 

tive) 


Node 


PA 


Node 


PA 




Application 




Language 


A=AV 




Language 


A=AV 


Managerl 


A=AV, 

L=Korean 




Manager2 


A=AV, 

L= Japanese 


Manager 1 


A=AV, 

L=Korean 


TestCase2 


A=AV, 

L=Korean, 
M=phrase manager 




Manager2 


A=AV, 

L= Japanese 


TestCase2 


A=AV, 
L=Korean, 
M=phrase man- 
ager 


Manager! 


A=AV, 

L= Japanese 


TestCase2-Korean 
(cf. Figure 1) 


Manager2 


A=AV, 

L=Japanese 


TestCase2 


A=AV, 

L= Japanese, 
M=phrase manager 




TestCase2 


A=AV, 

L= Japanese, 
M=phrase man- 
ager 






TestCase2- Japanese 
(cf. Figure 1) 



6 We use a Z-like notation to describe the algorithm. 
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3 Validation 

One of Avaya’s business units is currently transitioning to product-line engineering. 
We identified around 600 commonalities and 100 variabilities [15] for the family, and 
designed an architecture with an information hiding hierarchy [12] as key component. 
At the lowest level in the hierarchy we have around 200 modules, many of which 
already existed in some form in current Avaya products. Because we cannot afford to 
stop product development while transitioning, we decided to use an evolutionary 
approach and are following an incremental adoption strategy for re-engineering the 
product units’ legacy systems [7]. The product-line environment we are adopting 
allows us to select the proper set of modules, configure them, and then compose them 
in order to build a family member. We are currently developing a decision model to 
guide this configuration process based on the variabilities’ parameters of variation. 

While we first focused on adopting product-line engineering principles for re- 
quirements engineering, software architecture, and code generation, we did not adapt 
our testing practice right from the beginning. However, testing is an area with high 
potential for reuse and reducing the current testing effort is a high priority topic. 

We applied the proposed testing method to parts of the product line’s legacy ac- 
ceptance test suite and analyzed 30 test cases with altogether 174 description items 
and 98 result items that on average each span one page of textual description. By 
applying seven parameters to generalize the test cases, we could reduce the number of 
description items from 174 to 23 and the number of expected result items from 98 to 
18. This corresponds to a reduction of duplication effort of around 85% for the dis- 
cussed case study. We especially expect to see the benefits for maintaining the test 
cases, because we now only have “one-point-of-change” instead of having to apply 
the same change to several test cases. The generalization process of the test cases was 
straightforward: we went through the list of parameters of variation, applied them to 
generalize the existing test cases, and replaced each existing test case by its generic 
version. The intention of the case study was to define a test case family that can be 
used to generate exactly the set of originally existing test cases and no more. That 
means we did not intend to apply our method as completeness check of the existing 
test suite and used constraints on the parameters to restrict the possible instantiations. 

The corresponding decision tree has 25 nodes (6 intermediate nodes and 19 leaf 
nodes), 24 branches and a maximum depth of 5 levels. We assigned the generic test 
cases to the tree’s leaves and used the decision tree for generating the test cases fol- 
lowing our generation algorithm. We were able to regenerate all originally existing 
test cases and did not generate any test cases that were not part of the original test 
suite. This demonstrates that we can restrict the number of description and result 
items as prescribed by our method without loosing information on the test suite. 

First results were promising enough to extend the method’s application to further 
test cases and to also design a tool suite to aid such a process. Currently the analysis 
of the test cases, the construction of the test case family, and the generation of test 
cases were carried out manually with the MS Office suite as only tool support. We are 
in the process of designing a tool suite that also integrates with other product line 
tools of our product line environment. 
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4 Related Work 

Test case generation and maintenance is not only very important in testing software 
systems, but also a very time-consuming process that requires a great deal of expert 
knowledge of the systems being tested. It is imperative to provide a methodology to 
generate and maintain test cases automatically based on certain available information 
of the target system, i.e., to leverage existing information such as a decision model in 
product-line engineering. 

In product-line engineering we have a set of common assets (most importantly the 
modules) from which the family members can be assembled. Therefore, in this con- 
text black-box testing of the product plays a major role. Thorough functional testing 
of products improves the reliability of the entire family. 

Much research in the area of test generation has been focused on requirement 
model-based automatic generation. Many conventional methods require the users to 
create a new model solely for the purpose of test generation (e.g. TestMaster). This 
extra work certainly increases the difficulty of using these methods. In addition, it 
also opens the door for possible inconsistencies between the original system model 
and the newly created “test generation" model. 

There are two types of models for automatic test generation: data model [13] and 
behavioral model. Examples of data model-based methods include the technologies 
used by TestMaster [18] and ATEG [17]. Examples of behavioral model-based ones 
are those using UML [14] or SDL [16]. Some attempt had also been made to use a 
combination of data and behavioral models [8]. However, none of these methods take 
advantage of the reuse potential and domain knowledge, which is available in prod- 
uct-line development. For instance, we can consider test steps as reuse components 
that can be shared among similar products. Test generation in this context can be 
automated without going through data or behavioral specification. 

On the testing within product line engineering, [9] summarizes issues and related 
solutions to this topic. It includes topics such as testing assets, testing products and 
core assets. It also mentioned automatic test generation. However the generation 
described in the report used formal requirements, other than existing assets such as a 
decision model. It didn’t mention automatic test generation from decision model. The 
work presented in [10] attempts to test product line architecture, other than products. 
Architecture testing often deals with the correctness and the efficiency of the archi- 
tecture itself, while product testing focuses on the behavioral aspect of the product 

This paper presents an innovative approach for automatic test generation and 
maintenance of black-box functional tests. It is different from other methods, as it 
does not require the creation of any new model. Rather, it uses a decision model 
(which we need also for other purposes) and the test steps as components. Finally, the 
format for the generated tests is flexible depending on the user selected component 
formats. For example, they can be given in the XML format which can be input to a 
test driver with a standard XML parser for automatic test execution 
(http://www.telelogic.com/products/tau/ttcn/index.cfm). Topics related to test execu- 
tion are not included in this paper. 
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5 Summary and Outlook 

In this paper we introduced a method for defining a family of test cases by generaliz- 
ing existing (or new) test cases driven by the parameters of variation of the common- 
ality analysis. The result of this generalization is a set of generic test cases which 
reference generic description and result items including their constraints. We leverage 
the decision model for implementing the constraints by attaching a generic test case to 
those leaf nodes of the decision tree that represent valid parameter assignments for the 
generic test case. We generate the test cases from the decision tree by traversing the 
decision tree in a depth first manner to collect both test cases and associated parame- 
ter assignments. When necessary the user is prompted for his or her decision of pa- 
rameter assignments. Our method allows a significant reduction of the number of 
description and result items that need to be maintained, which was demonstrated by 
our cases studies. 

Future work encompasses the following issues: 

Tool Support: Parts of our method (such as defining the constraints) is a creative 
process, but can be assisted by appropriate tools. Other parts such as combining 
constraints or instantiating the generic test cases can be automated. We are in the 
process of designing a tool suite that also integrates with other product line tools 
of our product line environment. For analyzing the test case family and con- 
structing the decision tree, the tool will assist in managing the set of parameters, 
constraints, and building blocks, identifying generalization potential, and as- 
signing the generalized test cases to the decision tree. For test case generation, 
the tool will allow the user to select parameters step by step while going through 
the decision tree. 

Test Case Optimization: As an additional advantage, our method can be used to 
optimize test case generation, i.e., to generate only a partial set of the test cases 
without having to browse through hundreds of text pages for identifying the rele- 
vant test cases. For instance, when we introduce a new release for one of the 
features, we want to focus on those test cases dealing with this feature and corre- 
sponding modules. This requires a suitable mapping of requirements variabilities 
and architectural variabilities which is discussed in more detail in [6]. Because 
we rarely test the complete test suite at once (effort would be too high), test case 
optimization is especially useful. 

Canonical Form: We made an important assumption for the course of this paper, 
namely that description and result items come in a certain canonical format. If an 
item does not refer to a parameter of variation from the commonality analysis (by 
not mentioning a possible value), we assume that this item is valid across all val- 
ues from the value range. For instance, item “Open prompts manager” does not 
refer to a specific application such as AV. As a consequence, we have to assume 
that we can open a prompts manager for all applications integrated with our 
product family, namely AV, BV, and CV. We are looking into canonical de- 
scription formats as future work. 



48 



B. Geppert et al. 



References 

[1] Ardis, M., Daley, N., Hoffman, D., Siy, H., Weiss, D., “Software Product Lines: a Case 
Study”, Software Practice and Experience 30(7), 825-847, June 2000. 

[2] Bosch, J., Design and Use of Software Architectures - Adopting and Evolving a Product- 
Line Approach, Addison-Wesley, 2000 

[3] Bosch, J., “Software Product Lines: Organizational Alternatives”, 23 rJ International 
Conference on Software Engineering ( ICSE ), 2001 

[4] Clements, P., Northrop, L., Software Product Lines - Practices and Patterns, Addison 
Wesley, 2002 

[5] Geppert, B., Roessler, F., “Combining Product-Line Engineering with Options- 
Thinking”, Intern. Workshop on Product-Line Engineering: The Early Steps (PLEES01), 
2001 

[6] Geppert, B., Roessler, F., Weiss, D., “Consolidating Variability Models”, ICSE Workshop 
on Variability Management, International Conference on Software Engineering, Port- 
land, OR, USA, 2003 

[7] Geppert, B., Weiss, D., “Goal-Oriented Assessment of Product-Line Domains”, 9* Inter- 
national Software Metrics Colloquium, Sydney, Australia, 2003 

[8] Li, J. J., and Wong, W., “Automatic Test Generation from Communicating Extended 
Finite State Machine (CEFSM)-Based Models”, Proc. IEEE ISORC, 2002 

[9] McGregor, J.D., “Testing a Software Product Line”. Technical Report CMU/SEI-2001- 
TR-022, ESC-TR-200 1-022 

[10] Muccini, H. and Hoek, A., “Towards Testing Product Line Architectures”, Electr. Notes 
in Theoretical Computer Science 82 No. 6 (2003). 
www.elsevier.nl/locate/emtcs/volume82.html 

[11] Parnas, D.; “On the Design and Development of Program Families”, in Software Funda- 
mentals, D. Hoffman and D. Weiss, Eds., Addison Wesley, 2001 

[12] Parnas, D., Clements, P., Weiss, D., “The Modular Structure of Complex Systems”, in 
Software Fundamentals, D. Hoffman and D. Weiss, Eds., Addison Wesley, 2001 

[13] Rapps, S. and E. J. Weyuker, “Selecting Software Test Data Using Data Flow Informa- 
tion”, IEEE Trans. On Software Engineering, SE- 1 1 ( 1985), pp. 367-375 

[14] Rumbaugh, J., Jabobson, I., and Booch, G., UML Reference Manual, Addison-Wesley, 
1998 

[15] Weiss, D., Lai, C.T.R., Software Product-Line Engineering - A Family-Based Software 
Development Process, Addison Wesley, 1999 

[16] ITU-T Recommendation Z.100, CCITT Specification and Description Language (SDL), 
International Telecommunication Union (ITU), 2000 

[17] ATEG - http://aetgweb.argreenhouse.com (April 2004) 

[18] Testmaster - http://testmaster.rio.com (April 2004) 



TTCN-3 Language Characteristics in Producing 
Reusable Test Software 



Pekka Ruuska and Matti Karki 

VTT, Technical Research Centre of Finland, P.O.Box 1100, 
FI-90571 Oulu, Finland 

{pekka . ruuska, matti . karki } @vtt . f i 



Abstract. TTCN-3 is a new programming language, which was especially de- 
veloped for testing. We analyzed how well the structure and the features of 
TTCN-3 conform to producing reusable test software. The analysis is mostly 
based on the conceptual model introduced in [1,2], the principles presented in 
[7,8,10] and our own understanding and experience of reusable software. Our 
conclusion is that TTCN-3 provides the basic language features for developing 
reusable test software. The modular structure of the language, its controlled and 
explicit interfaces promote reusability. Furthermore, the test specific character- 
istics of TTCN-3, which include its specific data types, expressions and test 
configurations, support reusability as well. When TTCN-3 is used in confor- 
mance testing of telecommunication protocols the reusability potential of 
TTCN-3 code is high. The more advanced reusability features that are required 
for object-oriented programming are not currently supported in TTCN-3. 



1 Introduction 

Software development methods, which rely on reusable software, should improve a 
system's reliability and security, shorten its development time and reduce the system's 
maintenance costs. Many recognized companies and organizations have adopted 
reuse of software into their standard software development methods. It is often con- 
sidered that properly reusable software can be produced only with object-oriented 
programming. Flowever, software reuse and the system development methods that 
promote reusability may be utilized with other type of languages and development 
tools as well. One specific domain where the reuse of software can be extremely 
beneficial is the development of test software. 

The ITEA-TT-MEDAL Project studies reusability in the development of test soft- 
ware, enhances software testing methodologies and improves automatic testing proc- 
esses. Furthermore, the application of a new test specification language TTCN-3 and 
the design of generic reusable tests are among the research tasks of the project. 

In this paper we present a detailed analysis of how the TTCN-3 language's struc- 
ture and its features support the reuse of software. The analysis is based on the sug- 
gested conceptual model introduced in [1,2] and the discussion presented in [10,11]. 
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Furthermore, the principles presented in [7,8] and our own understanding and ex- 
perience of reusable software were used in the analysis. Our final goal is to specify 
generic requirements for the design of reusable TTCN-3 tests. 



2 Development of Reusable Test Software 

The design of the software test environment is an essential part of software develop- 
ment work. When software is tested, it is executed with the intention of finding errors 
and to determine that the software meets its requirements [12]. In automatic testing, 
test cases are implemented with software. In the system development phase, the test- 
ing system may enable the execution of tests by providing simulations of the non- 
existent parts of the system. For these reasons, specific test software is needed in the 
development of software-intensive systems [9]. 

The development of the test system and the design of the test software is typically 
a concurrent process with the system development process. The specifications of test 
software derive from the system requirements of the actual system. 

Test software may be developed employing the same methods as with any other 
software. Test software differs from other software only in that its specific purpose is 
to find errors from systems under development or to ensure that systems function as 
they were specificated. Test software may be needed for verifying any type of system 
characteristics or requirements, for instance to execute performance tests or to test 
security, fault-tolerance or the reliability of a system. 

A system test environment may in some cases be reusable for many product gen- 
erations. At the abstract test level individual test cases are often reusable as is during 
the development process of a software intensive system. Test cases may also be de- 
signed for generic use and be reused with minor adaptations at later testing phases of 
a system. However, only few programming tools that support the reusability of test 
software are available. 



3 Reusability Features of Programming Languages 

All software has some potential for reuse; however, we are now discussing software 
which is originally designed with reusability as a substantive design goal. Properly 
managed and controlled reuse imposes certain requirements for the software devel- 
opment tools and for the programming language. In [1,2,10] is presented how a pro- 
gramming language's specific characteristics may promote or reduce its suitability for 
developing reusable source code. Furthermore, [6,8] and our own experience of de- 
veloping reusable software confirm the principles suggested in [2]. We analyzed 
TTCN-3's capability to support reusability by applying the conceptual model intro- 
duced in [2]. The model analyzes potential non-contract and contract dependencies 
between software components as well as the checkability, customizability and flexi- 
bility of the source code produced. 
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It is often thought that developing reusable software requires object-oriented pro- 
gramming and a programming language such as Eiffel, Small Talk or C++, which 
provide definition of classes, inheritance and aggregation. However, a practical ap- 
proach to the development of software from reusable components can be based on 
using basic procedural languages and tools. In this approach it is important that the 
software architecture, the software development organization and the software devel- 
opment process promote reusability. 





Fig. 1. A segment of code (Component A) is reusable, when it may use several different com- 
ponents or when various contexts may use a similar component (Component K) [2] 

We define reusable test software as test software, which is deliberately designed 
for reuse and which is reused either during a consecutive development phase of a 
software-intensive system, or in the development of some other system. The design of 
software for reuse requires extra effort and its productive use is possible only in a 
properly managed software development organization. 

We can name language features, which promote or reduce the reusability of code 
in practical cases. However, the reusability of code is difficult to define exactly. In [2] 
is presented that the reusability of a segment code is high, if it can be usefully in- 
voked from diverse contexts, or if a code segment can usefully invoke diverse com- 
ponents. 



4 TTCN-3 Features' Impact on Reusability 

TTCN-3 (Testing and Test Control Notation) is a programming language with test 
specific extensions. The test specific features of TTCN-3 are for instance handling of 
test verdicts and timers, the distribution of tester processes and mechanisms for com- 
parison of the expected reactions of the system under test. TTCN-3 also has a graphi- 
cal presentation format (GFT) and a tabular presentation format (TFT) [4], TTCN-3 
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was developed as a joint effort of ETSI (European Telecommunications Standard 
Institute) and ITU-T (International Telecommunication Union). 

TTCN (Tree and Tabular Combined Notation) and its later version TTCN-2 were 
specified primarily for supporting the conformance testing of telecommunication 
protocols. These test specification languages are limited to functional black box test- 
ing and they were not suitable for performance testing. TTCN-3 is much more generic 
although telecommunication clearly is its primary application domain. TTCN-2 and 
TTCN were not real programming languages, although there are various automatic 
testing tools available that may utilize TTCN or TTCN-2 test specifications in auto- 
matic testing. 

TTCN-3 presents a radical change from TTCN and TTCN-2. Even the name is 
changed, although the mnemonic TTCN was left to express the relationship with the 
predecessors. TTCN-3 abandons the tabular form of TTCN and TTCN-2. TTCN-3 is 
a text-based language that is used as a basis for document interchange [3]. 




Fig. 2. A TTCN-3 test suite: test cases are implemented as test functions, test verdicts may be 
used for controlling the test case execution 



4.1 Structure of TTCN-3 

The structure of TTCN-3 satisfies the first basic requirement of reusable software; its 
architecture is completely modular. Modularity limits the visibility of functions and 
variable definitions, which strongly reduces the generation of unintentional depend- 
encies between software components. It also helps to avoid name clashes. A TTCN-3 
module may import definitions from other modules, however, no sub-modules exist. 
A TTCN-3 module consists of a definitions part and an optional control part. A mod- 
ule which includes the definitions part only may be used as a generic header file for 
other modules. 
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The control part of a module executes the test cases defined in the module defini- 
tions part. Program statements are used to define test behavior, for instance if-else- 
statements, while loops or return statements. Test specific statements are trigger, 
catch (an exception) and clear (a port). Local variables and constants can be declared. 
Variable values can be passed into test cases or functions as parameters [4,5]. 

An example of a TTCN-3 module: 

module AnotherTestSuitel00v04 { 

// A module with a definitions part: 
const integer MyConstant := 5; 
type record MessageTypeB { /*...*/ } 
template MessageTypeB AMessageFromTheNet {/*...*/} 

// import a single definition: 

import from GenericModule502 {type YetAnotherType} 

// import all definitions of a module: 

import from ModuleLibrary54 { type all } 
function MyNewFunction () { /*...*/ } 
function MyComplicatedFunction ( ) { /*...*/ } 

type component MyCompZ { /*...*/ } 

testcase MyTestCaselOOl runs on MyCompZ{/* */ } 

testcase FinalTestCase9061 ( ) runs on MyCompZ{/* */} 
testcase YourTestCase2009 (inout boolean MyParam) 
runs on MyCompZ {/*...* / } 

control { 

// And a control part, which starts from here: 

// local variables: 

var boolean MyDecision := true; 
var boolean LowInputOpen := true; 

// A sequential execution of a test case: 
execute (MyTestCaselOOl ( ) ) ; 

// A while loop: 

while (LowInputOpen) 

{execute (YourTestCase2009 (LowInputOpen) ) ;} 

// A conditional execution of a test case 
if (MyDecision) { (FinalTestCase9061 ( ) ) } ; 

} // End of control part 

} // End of module AnotherTestSuitelOO 

A TTCN-3 module may be parameterized. The parameters provide a pass-by-value 
mechanism. A pass-by-reference mechanism exists as well, but its use is limited. In 
[2,8,10] parameter mechanisms are regarded as advanced support for reusability, 
while the pass-by-reference mechanism is criticized, as it creates a dependency for the 
context on the component as well as for the other direction. The strict module struc- 



54 P. Ruuska and M. Karki 



ture of TTCN-3 and the hiding of the test execution into the module control part mean 
that dependencies between modules may occur only in the module definitions part or 
through the parameter mechanisms. This provides a strong support for reusability, as 
it reduces non-intentional and implicit dependencies between modules. The drawback 
is that the architecture is not very flexible. However, it is asserted in [1,2,6,10,13] that 
disciplined rather than flexible programming style and modular system architecture 
are indispensable for the reuse of code. 



4.2 Type Definitions of TTCN-3 

The typing system of TTCN-3 is compatible with the specification language ASN.l 
(Abstract Syntax Notation One), which ETSI applies in its standard specifications. 
The basic data types are familiar from many other programming languages while 
there are some telecommunications specific extensions. Objectidentifier, octetstring, 
record and set of are the predefined data types of TTCN-3. A test specific type is 
verdict, which has the enumerated values none, pass, inconclusive, fail and error. 

The type definitions of a TTCN-3 module may be reused in another module by 
using import statements. There is no export statement. Other modules may import all 
TTCN-3 type definitions presented in the definitions part of a module. The import 
definition together with except definition allows creating new versions of definitions 
presented in another module. Local variables may be used in the control part of a 
TTCN-3 module. Global variables are not supported. For reusability global variables 
provide difficulties, as checking the dependencies that global variables create is diffi- 
cult. Global variables also produce potential name clashes [2], The exclusion of 
global variables and the introduction of the import and the except statements are fea- 
tures which promote reusability of TTCN-3 code. 

TTCN-3 test values can be defined with templates. Templates may be used in 
communication operations (send, receive) for instance to check whether a received 
message has the expected value. Templates are placeholders for a single test value or 
a set of test values. TTCN-3 provides matching mechanisms, which specify the value, 
or the value range that a specific field in a template may get. Fields can also be de- 
fined optional, which means that they need not be present in every message or only 
their length may need to match [4,5]. Templates may be parameterized and modified 
to create new templates from a parent template. 

Templates and modified templates introduce elementary polymorphism in TTCN- 
3. For reusability, controlled flexibility is beneficial; therefore the support of tem- 
plates promotes the reusability of TTCN-3 code. 



An example of a message template: 

template MyRecord AnOtherMessage := { 

// use of 'inside of value' matching mechanism 
fieldl := "abc*cbd", 

// use of 'instead of value' matching mechanism 
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f ield.2 := *, 

// use of 'specific value' matching mechanism 
f ield3 := '200'H 

} 



4.3 TTCN-3 Test Configurations 

The TTCN-3 standard specifies test configurations as well. A test configuration is the 
run-time environment of TTCN-3 components. Components are created from mod- 
ules. Communication between TTCN-3 components is allowed through ports, which 
must be introduced in the module definitions part. 

TTCN-3 allows dynamic as well as static specification of concurrent test configu- 
rations. A test configuration consists of a set of inter-connected test components. 
There is always precisely one main test component (MTC) and all other test compo- 
nents are parallel test components (PTC). 




Fig. 3. MTC and PTC communicate through connected ports and with the test system interface 
the components communicate through mapped ports with SUT (System Under Test) 

TTCN-3 ports may be defined as message-based, procedure-based or both of the 
types may be allowed at the same time. Ports are directional; they are specified as in, 
out or inout. Message-based ports define which messages or message types are al- 
lowed to transfer through a specific port. A procedure type of port allows the remote 
call of procedures through a specific port. By using the keyword all it is possible to 
allow the remote call of all procedures defined in a module [4,5]. 
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An example of a message-based port: 

type port AnotherPortType message { 

// A message-based port 

in MyMsgeTypel, MsgType233; 
out ShortMessageType3 ; 
inout integer 
} 

An example of a procedure-based port: 

type port AnotherProcedurePortType procedure! 

// A procedure-based port 

out Proceedl, Procedure289 ; 

} 

An example of a component definition: 

type component MyComponentType { 
port PortTypeOne PC01 ; 
port PortTypeB PC099; 

} 

Concurrent systems are more complex and challenging to develop than sequential 
systems. Concurrent systems are also problematic for reusability as they often intro- 
duce implicit dependencies. However, the concept of components and ports implies 
that the TTCN-3 environment supports concurrency in an explicit and very controlled 
way. It is widely understood [2,7,8,10,13] that explicit interfaces are crucial for reus- 
ability. 



Table 1 . TTCN-3 reusability features summarized. 



Reusability features in core 
languages 


Promotes - 

reduces 

reusability 


How TTCN-3 handles the issue. 


Checkability of dependencies 


promotes 


Dependencies between components 
are few and they are explicit. 


Customizability of a component 


promotes 


Modules and functions may be pa- 
rameterized, which presents con- 
trolled customization. 


Customizability of a context 


promotes 


Import statement and templates 


Flexibility (minimized depend- 
encies) 


promotes 


Flexibility is controlled; functions are 
not visible to other modules unless 
they are not especially imported. 


Usability (e.g. number of con- 
texts that can usefully invoke a 
component) 


promotes 


Usability of modules is restricted to 
tests, while for that purpose reuse 
potential is high. 


Explicit support for dependency 


promotes 


All dependencies are in the defini- 
tions part, which makes them explicit 
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5 Conclusions and Further Work 

The basic structure of TTCN-3 implies that the principles of developing reusable code 
were understood and respected in its design process. TTCN-3 supports named proce- 
dures and procedure libraries, the type checking of procedure parameters and user- 
defined data types while use of shared variables is restricted and controlled. The lan- 
guage does not support global variables or submodules. All these features are impor- 
tant for the reusability of code. TTCN-3 also provides templates for defining test 
values as well as components and ports for keeping dependencies between modules 
explicit. Avoidance of implicit dependencies (no global variables, limitations in pass- 
by-reference mechanisms for parameters) and support of checkability as well as 
keeping the interfaces and dependencies explicit are features which strongly promote 
the reusability of source code. At the same time the support of the object-oriented 
features were excluded, which means that TTCN-3 programmers do not need to learn 
the more sophisticated programming techniques. Simplicity is desirable for test soft- 
ware, which is often used for testing extremely complex systems. TTCN-3 provides 
the basic support for reusability, but not the more advanced features. For reusable 
tests, TTCN-3 offers specific facilities 



5.1 Further Work 

The introduction of parameterized data types and inheritance mechanisms into 
TTCN-3 has been under discussion within the user community of TTCN-3. However, 
developing TTCN-3 to fully support object-oriented programming could lead to 
practical difficulties. There is always the danger of the test software becoming so 
complex that we need another testing environment for finding errors of test software. 
Nevertheless, new features of TTCN-3 deserve further research and analysis. 

Our work is proceeding in the development of decent methods and processes for 
utilizing TTCN-3 language's specific features optimally and to enhance the reuse of 
test software. Furthermore, we are going to test our methods in the software devel- 
opment work with real example cases. This research work may result in new propos- 
als to the TTCN-3 specifications. 
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Abstract. Testing is the most time consuming activity in the software devel- 
opment process. The effectiveness of software testing is primarily determined 
by the quality of the testing process. Software reuse, when effectively applied, 
has shown to increase the productivity of a software process and enhance the 
quality of software by the use of components already tested on a large scale. 
While reusability of testing material and tests has a strong potential, few if any 
approaches have been proposed that combine these two aspects. Reusability of 
testing materials is desired, when test development is complex and time- 
consuming. This is the case, for example, in testing with test-specific lan- 
guages, such as the TTCN-3. To meet these needs, this paper suggests a test 
development process model that takes software reuse techniques and activities 
into account. This paper shows further that in order to produce reusable test 
material, the software entities must be expressed in terms of features, in which 
the test materials are attached to. Also, the software components must be de- 
signed with reuse in mind when reusable test material is desired. The scope of 
the proposed test development approach is on the unit and integration testing, 
because the outcome of higher levels of testing is typically dependent on the 
tester’s subjective judgment. 



1 Introduction 

The quality of a software system is primarily determined by the quality of the soft- 
ware process that produced it. Similarly, the quality and effectiveness of software 
testing are primarily determined by the quality of the test processes used [1], A soft- 
ware testing process has two primary goals: To give a certain level of confidence that 
the software meets, under specific conditions, its objectives, and to detect faults in the 
software. All activities related to testing aim for achieving these two objectives. [2] 
Automated testing with advanced languages, such as the TTCN-3 [3], is typically 
used in testing critical components of a system, such as application programming 
interfaces or communication protocol implementations. Advanced languages are used 
for automating complex testing tasks, where the correctness of the software is not 
dependent on a tester’s subjective interpretation. A test development process produces 
material, which in the case of advanced languages, is usually a set of automated test 
suites with related information. Reusability of the produced testing material is de- 
sired, when test development is complex and time-consuming. 
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Software testing includes various activities, that go beyond the execution of code. 
Static testing is the process of reviewing documents, and detecting errors without 
executing the system. Dynamic testing, correspondingly, is the process of detecting 
faults in the software by executing the software with appropriate test materials such as 
automated scripts, or test-specific software components. [4] This paper focuses on the 
dynamic aspect of testing, and excludes the white-box techniques and static reviews 
because we are primarily interested in automated testing with advanced languages. 
This paper is of special interest to those, who develop or plan to develop reusable 
software components, and lack the process support for developing reusable tests at- 
tached to the components. 

Producing and taking advantage of reusable test material requires restructuring in a 
conventional test development process. The process must support the activities of 
determining what test material is to be designed reusable, and how it is to be reused. 
Since creating reusable tests is dependent on the used software reuse approaches, the 
test development process must take them into account. Currently there are no ap- 
proaches that would specificly address this issue. Selective retest techniques [5], and 
other regression testing techniques [6] seek to reuse tests for evolving software, but 
software reuse techniques are different by nature. Software components are reused 
not only across different versions, but also across different software products. Also, 
the built-in test (BIT) approach [7] addresses test reusability, but is used for develop- 
ing tests inside the source code. This approach is not suitable when testing with ad- 
vanced languages, or when testing third-party components. The subject of developing 
reusable tests is still immature in the software domain, and the intent of this paper is 
to produce a generic process model to partly overcome this issue. 

This paper is structured as follows: Chapter 2 describes the different views of test 
processes, and their approaches for carrying out the testing activities. Chapter 3 fo- 
cuses on software reuse, and presents the techniques, that are used in reusing software 
artifacts, and used as the basis of our reusable testing approach. In addition, the fea- 
ture-based testing method is discussed. Chapter 4, presents the reusable test develop- 
ment process model including the necessary software reuse techniques. In Chapter 5, 
the future research challenges are discussed. 



2 Generic Test Process Models 

Testing is typically embedded in the software development process, which is pre- 
scribed by the used development methodology. It is possible, however, to consider 
the testing process as a separate, supporting process, that seeks to find defects in the 
software, and verify that the software meets its functional and quality requirements. 
Viewing the testing activities as a defined process has advantages. A defined process 
enables goal-oriented behavior from a group of individuals, and increases the consis- 
tency and repeatability of all activities. Process thinking also enables the measurabil- 
ity of the process, and continuous improvement activities. [8] Software engineering in 
general is heavily dependent on individuals, but when groups become larger, the need 
for a defined practice of doing testing increases. It is shown that larger organizations 
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tend to stress the importance and value of well-defined work instructions, while 
smaller organizations seem to depend more on experienced individuals [9], 

A common defect detection strategy is to subject a software product into several 
phases of testing [10]. The V-model of testing binds the phases of the Waterfall de- 
velopment model with the phases of testing [11], This view of the testing process 
suggests that test phases are planned at particular phases of the software development, 
and then tests are carried out against the results of the corresponding development 
phase. The V-model includes four test phases, namely unit testing, integration testing, 
system testing, and acceptance testing. The V-model itself is an abstract guideline for 
dividing the testing process into certain steps, and it requires refinement to turn into a 
set of concrete testing activities. 

Another approach of a testing process is that of the Test Management Approach 
(TMap). This approach does not divide testing into vertical levels, but divides the 
overall testing process into five consecutive steps. The steps of the TMap are depicted 
in Figure 1 [12]. 



Planning 

& 

Control 



Preparation 



Specification 



~V 



Execution 



Completion 



Fig. 1 . Steps of the TMap process model. 

The TMap is concentrated on the organization of the overall testing process and 
the testers instead of focusing on technical aspects of testing. The TMap puts a par- 
ticular weight on the planning and preparation phase, because they are seen crucial 
for achieving a high-quality testing process. Planning includes the testing strategies, 
organization and test management. The control activities span through the whole 
testing process. The preparation phase includes the organization of subsystems, 
training of testers, and the familiarization with the system specifications. The specifi- 
cations phase includes the development of test cases, and test infrastructure. The 
execution phase includes pre-testing and the actual execution, where the tests are run 
according to the designed test strategy. In the completion phase, the consistency of 
testware, and software specifications are evaluated. 



3 Software Reuse 

The historical view of software reuse is the creating software systems from existing 
software elements rather than building systems from scratch [13]. Karlsson [14] di- 
vides the process of software reuse into two corresponding sides: the for reuse, and 
the with reuse. In this viewpoint, particular weight is put on the fact that reusability of 
software is only achieved when it is planned in the requirements specification and 
design phases. Reusable elements in the software process are not limited to source 
code, but may also include other materials produced in the development process. 
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Jacobson et al. [15] present an incremental adoption model for software reuse, that 
emphasizes the assumption that process development always involves learning, and a 
high-quality process cannot be established in an instant. This model is divided into six 
steps, where a higher level always indicates a more mature process. The maturity of 
software reuse contributes to faster time-to-market, higher-quality systems, and lower 
overall development costs. As organizations move up stage to stage, they achieve the 
gains of reuse, and feel the pressure for more improvement. The transition from no 
reuse to informal occurs when the developers feel the need to reuse components in 
order to reduce time to market. Informal reuse typically leads to maintenance prob- 
lems, that are addressed by transition to black-box reuse. Black-box reuse works for 
many organizations, but there is often a need to do more systematic reuse. In man- 
aged workproduct reuse, the creation and reuse of components are explicitly managed 
by a distinct organization. At the next level, systems and their components are de- 
signed at high levels to fit together. This approach leads to reusing larger software 
structures than individual components. In a reuse-driven organization, cultural aspects 
are taken into account, and management, processes, and organization are all commit- 
ted to achieving high levels of reuse. 

Most mature reuse processes use the ideas of product-line architectures [16] and 
domain analysis [17] to discover families of products that share most of their compo- 
nents with other products in the same family. In order to achieve the high levels of 
maturity in software reuse, an organization needs a systematic approach that focuses 
on a set of projects in an application area, rather than focusing on one particular proj- 
ect. This in turn, requires that the projects possess some common characteristics. 
Thus, it is not always suitable to pursue the highest levels of reuse. 

3.1 Development for Reuse 

Development for reuse is the planned activity of constructing a component for reuse 
in contexts other than one for which it was initially intended [14, p.256]. The ability 
to reuse a component requires that it fulfills a reuser’s need for a specific functional- 
ity. The emphasis on development for reuse is on the planning and architectural de- 
sign of systems, because highly coupled components cannot be transferred into other 
contexts. Strategic planning is required for stating where the components can poten- 
tially be reused. Without it, an organization would waste effort in cataloging assets, 
but never using them in a systematic way [18]. Thus, development for reuse also 
addresses the packaging and stocking the components in a systematic manner. 

Five techniques are presented which may be used in developing reusable compo- 
nents [ 14]. 

- Widening means identifying a set of requirements that are not contradictory, then 
making a general component that satisfies all of them. This technique has obvious 
disadvantages, as the reused components may carry features that are not needed in 
another contexts. 

- Narrowing means identifying functionality common to several customers which 
can be represented by an abstract component. Inheritance is an example of nar- 
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rowing in object-oriented systems. Narrowed components include common fea- 
tures, that are implemented, and abstract client-specific features, that are to be im- 
plemented by the reuser. Narrowing is commonly used in object-oriented frame- 
works, where certain parts of a system is implemented by the framework, and the 
application-specific functionality is implemented by the user by inheriting abstract 
classes of the framework. 

- Isolation means isolating certain requirements to a small part of the system, and the 
rest of the system can then be constructed relatively independently. Isolation is 
popular in layered architectures, where distinctive layers are isolated from each 
other. In the Symbian OS, dynamic link libraries (DLLs) are used to isolate certain 
requirements, and they can be developed independently [19]. 

- Configurability means making a set of smaller components, that can be configured 
or composed in different ways to satisfy different requirements. Certain design 
patterns [20] are examples of structures, that can be used to achieve configurabil- 
ity. Several patterns enable the user to configure object structures and behavior at 
run-time, thus enabling configurability of the system functionality. 

- Generators. Different requirements can be satisfied by making a ’’new” applica- 
tion-domain-specific language with which one can describe an application. 

The first four of these techniques are considered in this paper. As generators deal 
with creating new programming languages, they are omitted from our approach. 

3.2 Development with Reuse 

Software development with reuse is the development of systems using existing reus- 
able components [21], Reusing high-level components leads to the reuse of all associ- 
ated information, whereas reusing low-level components, such as general data struc- 
tures contributes mainly to the quality of the applications [14, p.334]. 

When developing with reuse, the software development life-cycle has to be modi- 
fied to support reuse. Karlsson [14] suggests seven reuse-specific activities that could 
be wholly or partly be associated with the development process: 1) The identification 
of component’s requirements consists of understanding the application and deter- 
mining the software components needed to implement the solution. 2) Searching for 
reusable components that meet the identified requirements from a repository or from 
outside the organization. 3) Understanding the candidate components’ functional and 
non-functional aspects. 4) Investigating how to adapt the candidate components into 
the final system. This includes the understanding of the modifications on the compo- 
nent, the system under development, or the requirements. 5) Selecting a component 
for reuse consists of choosing a solution from those possible using the candidate 
components. This decision can be based on various criteria, such as cost, legal as- 
pects, quality, and so on. 6) Adapting the selected components into the system under 
development. 7) Reporting on the reuse of a component. Reporting is said to contrib- 
ute to reusability because the reuse experience is stored together with its documenta- 
tion. 
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3.3 Test Reusability 

Korhonen et al. [22] define a method for reusing tests for configured software prod- 
ucts, which is based on the feature-based software development and regression test- 
ing. Feature-based development views a software product as a set of features, where 
new configurations and version can be built rapidly. Features can be either common 
to all products in a product family, or specific for each product. The method proposes 
linking product features with test materials, i.e. the test materials are produced for 
each implemented feature. In the feature-based testing method, test components and 
product feature descriptions are used for selecting, modifying and configuring the 
tests. Configurability of test suites typically requires support from the used scripting 
languages that enable parametrization of test cases. Test suites are divided into a 
common part, and a product- specific part. 

Test reuse differs from software reuse in that tests cannot be directly reused, but 
they are always bound to the software components, interfaces, or features that are the 
subject of testing. In order to produce reusable tests, software components have to be 
designed reusable, and transferred into another contexts. 



4 Reuse in the Test Development Process 

This section describes the process model for test development, which includes the for 
reuse techniques and with reuse activities. The proposed model is depicted in Fig- 
ure 2. 

The test development process is divided into five steps, starting from planning and 
ending in a results analysis phase. The objective of a generic process model is to be 
adaptable to any kind of development approach, or stage in the software life cycle. 
The process model is developed according to the process definition by Humphrey 
[23]. Certain outcomes of the process steps are adopted from the IEEE standard for 
software test documentation [24], though the standard includes a lot of documentation 
overhead and is not suggested to be used as is [25]. The following sections describe 
the steps of the proposed test development process. 

Test planning. Test planning can be started when the initial software features are 
determined. A test plan defines at least the scope, approach, resources, and schedule 
for the test development process. Scope definition includes a summary of software 
items, and features to be tested. Testing approach includes the major testing tasks, and 
describes how the software items are to be tested at a general level. It also includes an 
initial pass/fail criteria for each software item [24]. 

Test suite design. Test suite design is the process of refining the planned test ap- 
proach, identifying the actual features to be tested, and designing the actual imple- 
mentation and execution steps. A precise pass/fail criteria is specified for each feature 
under test. Test suite design produces a set of test cases with a rationale for selecting 
each case. Also, a technical design is produced for the test suites, that may include 
test scripts, software stubs, drivers, etc. For each test case, at least the input specifica- 
tion, output specification, and environmental needs shall be described [24], 
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Fig. 2. Test development process for and with reuse. 



Test suite implementation. Test suite implementation is a straight-forward proc- 
ess of producing the test material and setting up the test infrastructure. Implementa- 
tion also includes designing a test log, which identifies and stores all significant ac- 
tivities and events in the execution phase. [24] Implementing test suites should not 
generate further design decisions. 

Test execution. Test execution is a controlled, and repeatable process, which pro- 
duces results from the implemented test material. Execution results are typically 
documented in an incident report, that includes any event that occurs during the test- 
ing process that requires further investigation. For each incident, the impacts on test 
plans, test design specifications, test cases, and so on shall be documented. [24] 

Test results analysis. A results analysis concludes the testing process if no modi- 
fications are required from the test materials or the test subject. Analyzing the test 
process itself is useful when the test materials must be maintained with the software 
items. 

The iterations in the process model indicate, that the process shall be at least partly 
restarted when some issues are discovered in the testing process. A need to restart the 
test planning phase occurs when the test coverage is not extensive enough, or the test 
approach taken is not suitable. The test suite design phase is restarted if the test cases 
are insufficient, or they contain technical design flaws. The implementation phase, in 
turn, is restarted when the test cases require modifications for their implementation, 
or the test environment requires modifications. 
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4.1 The For Reuse Process 

The used reuse technique poses some extra features for the test planning and test suite 
design phases. Karlsson [14] describes the development of test materials in the con- 
text of reuse techniques. 

- When Widening is used, test material is required for all features of the component. 
Thus, when reusing a component, the test material does not necessarily require up- 
dates, because the goal of widening is to reuse the component as is. 

- When Narrowing is used, test material is produced for the common features of the 
component, and abstract test cases or dummy drivers are created for abstract fea- 
tures. When abstract features are implemented in a later development phase, the 
abstract test material requires realizations. For example, when a class is inherited 
in object-oriented systems, new test material is needed for the derived class. 

- When Isolation is used, test material is produced for the isolated features and com- 
ponent’s external interfaces. When new interfaces are introduced, or the existing 
ones are modified, new or modified test material is required for. 

- Karlsson [14] argues, that the testing task is very hard to automate when configu- 
rability is used because components can create dynamic connections with other 
components. For each component, test material can be produced, but to deduce 
system tests from individual component’s test cases, is very difficult. Karlsson 
suggests, that the test materials include guidelines for testing all communication 
mechanisms that the component provides. Another possibility could be to create 
test materials for each configured feature. This means in practice, that all small 
items, such as classes are unit-tested alone, and then test cases are created for fea- 
tures, that are implemented by combining these classes. 

When reusability is desired in the test development process, the test planning phase 
starts at the same time with architectural design. This is when the components, their 
connectors and dependencies are defined. At this phase, the component’s features 
shall be defined into common and product-specific, if suitable. High-level test plans 
are created for the high-level software items, and the developer determines how the 
reusable test material is to be created. At the test suite design phase, the used reuse 
technique imposes some constraints on the test design, as discussed above. Test im- 
plementation and execution are straight-forward activities also in reusable test devel- 
opment. Analyzing the test results is important when aiming for historical reuse, be- 
cause test suites might require modifications after a testing cycle. The analysis pro- 
vides feedback to the previous phases on test suite quality, test approach suitability, 
correctness of the test implementations, and the coverage of the test plan. 

4.2 The With Reuse Process 

Testing a reusable component has to be performed by both the producer and the re- 
user. The effort needed to reuse a component will be reduced if good test documenta- 
tion and other material is provided with the component. [14, p.267] The process 
model in Figure 2 combines the with reuse software development activities with the 
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test development process. Test planning for a component can be started when a com- 
ponent is selected for reuse. Implementing adaptations in a new context, in turn, starts 
the test suite design phase. At this phase, the used reuse technique affects the test 
suite designs, as discussed in section 4.1. Test planning and test adaptations are re- 
quired, when a component is transferred into another context. When it comes to re- 
gression testing, test suites created for an older version of a component are also at 
some extent suitable for a newer version. The challenge at this stage of development 
is to maintain the integrity of software and tests in order to not make the tests obsolete 
by not updating them when necessary. 



5 Conclusions and Further Research Needs 

This paper has shown that introducing software reuse in a development process also 
contrives modifications in test development, when reusability is desired for the re- 
lated test materials. A generic test development process model was adapted in the 
context of software reuse techniques in the ‘for’ side, and activities in the ‘with’ side. 
Modern software reuse techniques emphasize reuse of knowledge at all levels. Thus, 
there are many other items than code fragments, that can be potentially reused. When 
considering reuse from testing point of view, however, it must be noted that test suites 
cannot be independent from software components, features, interfaces, and so on. 
They are always bound to the software items, that are the subject of testing. 

The process model introduced is generic by nature, which is purposeful, but means 
that it requires adaptations to be used in an industrial testing environment. Consider- 
ing the proposed test development approach in an industrial environment requires that 
the client organization implements both the for and with sides of software reuse. 
Thus, there must be at least some form of systematic reuse. There are two possible 
approaches on how to adapt the process model in an industrial testing environment. 
The first option is to use a model-driven approach, where the proposed test develop- 
ment process forms the basis for reusable test development. The second option is to 
upgrade an existing test development process to include software reuse aspects, as 
proposed in this paper. 

This paper has contributed an initial approach for achieving higher levels of reus- 
ability in test development. The intent was to produce a starting-point for creating 
reusable test suites. Later, this approach will be validated in an industrial testing envi- 
ronment. 
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Abstract. Analyzing commonalities and variabilities among products of a 
product line is an essential activity for product line asset development. A fea- 
ture-oriented approach to commonality and variability analysis (called feature 
modeling) has been used extensively for product line engineering. Feature 
modeling mainly focuses on identifying commonalities and variabilities among 
products of a product line and organizing them in terms of structural relation- 
ships (e.g., aggregation and generalization) and configuration dependencies 
(e.g., required and excluded). Although the structural relationships and con- 
figuration dependencies are essential inputs to product line asset development, 
they are not sufficient to develop reusable and adaptable product line assets. 
Other types of dependencies among features also have significant influences on 
the design of product line assets. In this paper, we extend the feature modeling 
to analyze feature dependencies that are useful in the design of reusable and 
adaptable product line components, and present design guidelines based on the 
extended model. An elevator control software example is used to illustrate the 
concept of the proposed method. 



1 Introduction 

Product line software engineering (PLSE) [1] is an emerging software engineering 
paradigm, which guides organizations toward the development of products through 
the strategic reuse of product line assets (e.g., architectures and software compo- 
nents). In order to develop product line assets, commonalities and variabilities among 
products must be analyzed thoroughly, as commonalities can be exploited to design 
reusable assets and variabilities can be used for the design of adaptable and configur- 
able assets. 
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A feature-oriented approach to commonality and variability analysis has been used 
extensively in product line engineering [2], [5], [6], [10], [16], [17]. This is mainly 
due to the fact that the feature-oriented analysis is an effective way of identifying and 
modeling variabilities among products of a product line. In addition, the analysis 
result, a feature model, plays a central role in configuring multiple products of a 
product line. 

In the feature-oriented approach, variable features are considered units of variation 
(i.e., addition or deletion) in requirements. A variable feature, if not properly de- 
signed, may affect a large part of product line assets. If features are independent each 
other, each of them can be developed in isolation, and effects of a variation can be 
localized to the corresponding component. However, as features usually are not inde- 
pendent, a feature variation may cause changes to many components implementing 
other features. For example, in an elevator product line, the Automatic Door Close 
feature, which automatically closes doors if the predefined time elapses after doors 
have been open, can be active during the operation of Passenger Service feature. 
However, the Automatic Door Close feature must not be active while the VIP Service 1 
feature is active, as the VIP Service feature allows doors to be closed only when the 
door close button is pressed. That is, activation or deactivation of Automatic Door 
Close depends on the activation state of Passenger Service or VIP Service. If the 
activation dependency is embedded with the component implementing the Automatic 
Door Close feature (i.e., the component checks activation of Passenger Service or 
VIP Service before executing its functionality), the inclusion or exclusion of VIP 
Service causes changes to the component for Automatic Door Close. Moreover, if a 
newly introduced feature Fire Fighter Service 2 does not also allow the Automatic 
Door Close feature to be active during its operation, the inclusion of the Fire Fighter 
Service feature also causes a change to the component for Automatic Door Close. 

Feature dependencies have significant implications in product line asset develop- 
ment. In the product line context, products of a product line must be derivable from 
product line assets. Product line assets must be designed so that inclusion or exclusion 
of variable features causes little changes to components implementing other features. 
In order to achieve this goal, various dependencies that variable features have with 
other features must be analyzed thoroughly before designing product line assets. With 
this understanding, product line assets must be designed so that variations of a feature 
can be localized to one or a few components. 



1 VIP Service indicates a driving service an elevator provides for VIPs. When it is activated by 
the landing key switch, a car moves to a predefined floor and waits for a car call after open- 
ing doors. If a car call is entered and the door close button is pressed, the car will serve the 
car call. When no car call is entered within the preset time, the car will close its doors and 
resume Passenger Service. 

2 Fire Fighter Service may be activated using a key switch in a car and places an elevator 
under the control of a fire fighter. The behavior of this feature must be in compliance with 
the local laws. In this paper, we assume that the car starts traveling while a car call button is 
being pressed and stops at the requested floor with doors closed. The doors can be opened 
only while the door open button is being pressed. If the button is released, the doors are 
closed immediately. 
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Although understanding feature dependencies is critical in product line asset de- 
velopment, they have not been analyzed and documented explicitly. In this paper, we 
extend the feature analysis method to include a feature dependency analysis as well as 
a commonality and variability analysis. This paper introduces six types of feature 
dependencies and discusses what engineering implications each class of feature de- 
pendencies has in the design of reusable and adaptable product line assets. Based on 
this understanding, we present component design guidelines. An elevator control 
software (ECS) product line example is used throughout this paper to illustrate the 
concepts and the applicability of the proposed method. 

An overview of the feature modeling in [2], [9], [14] is given in the following sec- 
tion. 



2 Feature Modeling Overview 

Feature modeling is the activity of identifying commonalities and variabilities of the 
products of a product line in terms of features and organizing them into a feature 
model. The structural aspect of a feature model is organized using two types of rela- 
tionships: aggregation and generalization relationships. The aggregation relationship 
is used if a feature can be decomposed into a set of constituent features or a collection 
of features can be composed into an aggregate feature. In cases where a feature can be 
specialized into more specific ones with additional information or a set of features can 
be abstracted into a more general one, they are organized using the generalization 
relationship. 

Based on this structure, commonalities among all products of a product line are 
modeled as common features, while variabilities among products are modeled as 
variable features, from which product specific features can be selected for a particular 
product. Variable features largely fall into three categories: alternative, OR, and op- 
tional features. Alternative features indicate a set of features, from which only one 
must be present in a product; OR features represent a set of features, from which at 
least one must be present in a product; optional features mean features that may or 
may not be present in a product. In the feature model, selection of variable features is 
further constrained by “required” or “excluded” configuration dependencies. The 
“ required configuration dependency ” between two features means that when one of 
them is selected for a product, the other must also be present in the same product. An 
“ excluded configuration dependency ” between two features means that they cannot be 
present in the same product. 

The commonality and variability information manifested in the feature model 
serves as a basis for developing product line assets. However, the commonality and 
variability information is not sufficient to develop reusable and adaptable product line 
assets. The next section presents a feature dependency analysis, which is indispensa- 
ble for the development of reusable and adaptable product line assets. 
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3 Feature Dependency Analysis 

Feature modeling in [2], [9], [14] mainly focuses on the structural relationships and 
the configuration dependencies between features. Although structural relationships 
and configuration dependencies are essential inputs to the development and configu- 
ration of product line assets, operational dependencies among features also have sig- 
nificant implications in the development of product line assets. By operational de- 
pendency, we mean implicitly or explicitly created relationships between features 
during the operation of the system in such a way that the operation of one feature is 
dependent on those of other features. Thus, this section introduces operational de- 
pendencies that have significant influences on product line asset development. Each 
of these is described below. 

Usage dependency: A feature may depend on other features for its correct func- 
tioning or implementation. For example, in the elevator control software, the Direc- 
tion Control feature depends on the Position Control feature, as it requires current 
position data from the Position Control feature to compute the next movement direc- 
tion of an elevator. Moreover, the Position Control feature depends on the Position 
Sensor feature, as it requires sensor inputs from the Position Sensor feature to calcu- 
late the current position of an elevator. If one feature (a client) requires other feature 
(a supplier) for its correct functioning or implementation, we define that the first 
feature depends on the second feature in terms of Usage dependency. 

Modification dependency: The behavior of a feature (a modifyee) may be modified 
by other feature (a modifier), while it is in activation. For example, in the elevator 
control software, the Registered Floor Stop feature determines the landing floors of 
an elevator. The Load Weighing Bypass feature, however, changes the behavior of the 
Registered Floor Stop feature by ignoring hall calls when a car is loaded to a prede- 
termined level of capacity. Modification dependency between two features means that 
the behavior of a modifyee may be modified by a modifier, while it is in activation. If 
the modifier is not active, it does not affect the behavior of its related modifyee fea- 
tures. 

A feature must be active before it can provide its functionality to users. An activa- 
tion of a feature may depend on that of other feature. Activation dependency can be 
classified into four categories: Exclusive-Activation , Subordinate-Activation, Concur- 
rent-Activation, and Sequential- Activation, each of which is described below. 

Exclusive-Activation dependency: Some features must not be active at the same 
time. For example, in the elevator control software, the Passenger Service feature and 
the Fire Fighter Service feature must not be active simultaneously, as only one of 
them can be provided to users at a time. Exclusive-Activation dependency between 
two features means that they must not be active at the same time. 

Subordinate-Activation dependency: There may be a feature that can be active 
only while other feature is active. For example, in the elevator control software, the 
Passenger Sendee feature consists of several operation features (e.g., Call Handling 3 , 



3 Call Handling is an operation for registering or canceling passenger’s call requests. 
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Door Control 4 , and Rim Control 5 ). These features can be active while the Passenger 
Service feature is active. Subordinate-Activation dependency between two features 
means that one feature (a subordinator) can be active while the other feature (a supe- 
rior) is active, but must not be active while its superior is inactive. Note that a subor- 
dinator may not be active during the operation of its superior. Also, subordinators 
may be in Subordinate-Activation dependency with more than one superior. For ex- 
ample, Call Handling, Door Control, and Run Control can also be active during the 
operation of Fire Fighter Service as well as Passenger Service. In case subordinators 
depend on activation of more than one superior, they can be active while at least one 
superior is active. 

Subordinators may further depend on each other in terms of concurrency or se- 
quence. 

Concurrent-Activation dependency: Some subordinators of a superior may have to 
be active concurrently with each other while the superior is active. For example, Call 
Handling and Run Control must be active concurrently while Passenger Service is 
active, as an elevator must be able to register call requests from passengers and con- 
trol the run of the elevator concurrently during the operation of Passenger Service. 
Therefore, Concurrent-Activation dependency defined for subordinators of a superior 
means that the subordinators must be active concurrently while the superior is active. 

Sequential-Activation dependency: Some subordinators of a superior may have to 
be active in sequence. For example, Call Handling and Run Control must be active in 
sequence while Fire Fighter Service is active, as an elevator in Fire Fighter Service 
can register a call request only before its movement and start traveling immediately 
after registering a call request. Sequential-Activation dependency between two subor- 
dinators of a superior means that one subordinator must be activated immediately 
after the completion of the other during the operation of the superior. 

Usage dependencies among features provide important information that is useful 
for designing reusable product line assets. If multiple Usage clients use the same 
Usage supplier, the Usage supplier indicates commonality among Usage clients. 
Although the feature model captures commonalities among products (i.e., common 
features), Usage dependency analysis helps asset designers identify additional com- 
monality (i.e., commonalities among features). This commonality information (i.e., 
commonality among products and commonality among features) is an important input 
to the development of reusable components. 

In addition, Usage and Modification dependencies among features have significant 
implications in the design of reusable and adaptable product line assets. Suppose a 
feature (fl) requires one of alternative features (f2 and/3) for its correct implementa- 
tion. If components implementing// directly use components implementing f2 or/3, 
components for fl are changed whenever one of f2 and f3 is selected for a product. 
This implies that presence or absence of Usage suppliers may affect many compo- 
nents implementing their dependent Usage clients. To improve reusability and adapt- 
ability of product line assets, variations of Usage suppliers should be hidden from 



4 Door Control is a control operation to open and close doors. 

5 Run Control is a control operation for acceleration and deceleration of the elevator. 
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components implementing their Usage clients. Similar to features in Usage depend- 
ency, as presence or absence of a modifier affects the behavior of its dependent modi- 
fyee, components implementing modifyee features must be designed so that they can 
be easily adapted or extended without significant changes. 

Activation dependencies also have influences on reusability and adaptability of 
product line assets. As illustrated above, the operation features (i.e.. Call Handling 
and Run Control) may be active either concurrently or sequentially depending on the 
service mode (i.e., Passenger Service or Fire Fighter Service). As Fire Fighter Serv- 
ice is a variable feature, its presence causes changes to activation dependencies 
among the operation features. If these activation dependencies are hard-coded in the 
components implementing the operational features, variations of feature dependency 
caused by feature variation (i.e., addition or deletion) may cause significant changes 
to many components. To address this problem, activation dependencies among fea- 
ture must be separated from components implementing the features. 

With this understanding of engineering implications feature dependencies have in 
the design of reusable and adaptable product line assets, the next section presents 
detailed guidelines for designing product line asset components. 



4 Product Line Component Design 

This section describes component design guidelines on how feature-oriented analysis 
results (i.e., feature commonalities, feature variabilities, and feature dependencies) 
described in the previous sections can be used to develop reusable and adaptable 
product line components. 

Separating commonality from variability : Commonality in a product line can be 
looked at from two perspectives: commonalities among products and commonalities 
among features. If features are common across all products in a product line, they 
represent commonalities among products. If features are used by more than one fea- 
ture, they indicate commonalities among features. In order to improve the reusability 
of product line assets, components implementing commonalities must be decoupled 
from variabilities so that they can be commonly reused ‘as-is’ for production of multi- 
ple products. 

Hiding variable features from their Usage client features'. As discussed earlier, 
variation of a Usage supplier may affect many components implementing its Usage 
clients. In order to localize effects of a variation, the presence or absence of variable 
features must be hidden from components implementing their Usage client features. 
As variable features are classified into alternative, OR, and optional ones, detailed 
component design guidelines for each type of variable features are described below. 

In order to hide alternative features from Usage client features, generic aspects 
among alternative features must be modeled as an interface component, while specific 
aspects must be encapsulated in concrete components. As shown in Fig. 1, the factory 
component instantiates and returns one instance of each concrete component to the 
client component through the getlnstance method, and then the client component can 
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Fig. 1. Hiding XOR Variation 



access the concrete component through the abstract interfaces of the interface compo- 
nent. This approach allows run time binding of alternative features into a product 
without affecting other parts of the product. For build time binding to occur, the inter- 
face component can be designed as a parameterized component, which takes a con- 
crete component as an actual parameter. 




Fig. 2. Hiding OR variation 
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OR features can be hidden from Usage client features by introducing the Proxy- 
Component, which encapsulates the Usage dependencies to OR features and forwards 
requests from components implementing Usage client features to corresponding con- 
crete components implementing OR features. For example, as shown in Fig. 2, Cli- 
entComponentl uses the operation method of the ProxyComponent by giving its asso- 
ciated feature (i.e., clientl) as a parameter. The operation method then finds the con- 
crete component implementing the Usage dependent feature related to the clientl 
feature through the findComp method, and forwards the request to the concrete com- 
ponent. As client components do not know which concrete components implementing 
OR features will respond to their requests, any feature variation does not affect the 
client components. 

In case the component implementing an optional feature is directly used by the 
components implementing its Usage client features, the presence or absence of the 
former component cannot be fully hidden from its client components. In order to hide 
optional variation from its clients, the presence or absence of an optional feature must 
be decoupled from the components implementing its clients. As shown in Fig. 3, the 
ClientComponent makes use of the ProxyComponent, which forwards the request 
from the ClientComponent to the Component F1 , if the Component F1 exists. If the 
Components does not exist, the ProxyComponent does nothing. 



► 

comp 



Optional Feature 

If comp != null 
comp.operation() 

Fig. 3. Hiding Optional variation 

Although this approach can confine any feature variation into a single component, 
this may lead to inefficient systems, as variable features are accessed through abstract 
interfaces. Moreover, there may be a situation where it is very difficult to encapsulate 
variable features in a separate component due to other kinds of feature dependencies 
(e.g., modification and activation dependencies). 

Separating Modification dependency from components implementing modifyee 
features'. The presence or absence of modifier features affects components imple- 
menting their related modifyee features. In order to make components for modifyee 
features not changed even with presence of modifier features, modification depend- 
ency must be decoupled from components for modifyee features. Inheritance based 
methods (e.g., class inheritance or mixin [15] can be used to separate modification 
dependency from components implementing modifyee features. As shown in Fig. 4, 
the operation of the component implementing the Modifier feature extends or over- 
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Fig. 4. Separating Modification dependency 




Fig. 5. Separating Activation dependency 



rides the component for the Modifyee through the inheritance mechanism. An alter- 
native approach could be to implement the Modification dependency using an “as- 
pect” [11], which prescribes how components implementing Modifyee and Modifier 
features can be put together to form a complete behavior. 

Separating Activation dependency from components'. Activation dependencies 
between features must also be separated from components implementing related fea- 
tures, so that presence or absence of a feature will not affect components implement- 
ing other features. In order to separate activation dependencies from components, 
components must be specified in a way that they can execute their functionality only 
when they have permission to execute the functionality. As shown in Fig. 5, the client 
component determines the execution of its operation by asking a permission from the 
ActivationManager component. Once the client component has a permission to exe- 
cute the operation, it reports the activation and deactivation status to ActivationMan- 
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ager before and after its execution so that other components that depend on the acti- 
vation or deactivation of the feature are notified of the feature state. For instance, 
when a feature /is active, the components implementing the features that are depend- 
ent on /in terms of Subordinate-Activation or Concurrent-Activation are notified for 
execution. On the other hand, when a feature / is inactive, the components for the 
features that are dependent on / in terms of Exclusive-Activation or Sequential- 
Activation are notified for execution. As this approach makes components independ- 
ent of each other by separating activation dependencies from components, any feature 
variation does not affect other components, but only affects the activation manager 
component. 

The guidelines presented above can minimize changes to product line asset compo- 
nents, when variable features are included or excluded for production of a particular 
product. In the next section, we illustrate how feature dependencies can be analyzed 
and used to develop product line components using an elevator control software prod- 
uct line. 



5 Feature-Oriented Engineering of Elevator Control Software 

The example selected in this paper is an elevator control software (ECS) product line. 
For the purpose of illustration, we simplified the ECS product line by eliminating 
various indication features (e.g., current floor indication of an elevator), safety-related 
features (e.g., door safety), group management features (e.g., scheduling a group of 
elevators), etc. 



5.1 Commonality and Variability Analysis of the ECS Product Line 

Commonality and variability analysis is the first step towards the development of 
product line assets. Fig. 6 shows the result of a commonality and variability analysis 
of the ECS product line. Features that are common for all elevator products in the 
ECS product line are modeled as common features, while those that may not be se- 
lected for some products are modeled as variable features (denoted by “«v»”). ECS 
Product Line includes the Passenger Service, VIP Service, and Fire Fighter Service 
features. As VIP Service and Fire Fighter Service may or may not be present in a 
particular product, they are modeled as optional features. Similarly, Weight Sensor is 
modeled as an optional feature. However, as an elevator needs only one among digital 
and analog weight sensors, Digital and Analog are modeled as alternative features. 

Configuration dependencies constrain selection of variable features for particular 
products. For example, as Anti-Nuisance 6 requires weight data from Weight Sensor, 
the selection of Anti-Nuisance requires the selection of Weight Sensor, and therefore 
the required configuration dependency is modeled as shown in Fig. 6. 



Anti-Nuisance cancels all car calls if an excessive number of car calls are registered for the 
passenger load determined by the weight sensor.. 
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Fig. 6. The feature model of the ECS product line 



5.2 Feature Dependency Analysis of the ECS Product Line 

Once commonality and variability are analyzed and modeled in the feature model, 
dependencies among the features in the feature model must be analyzed. Fig. 7 repre- 
sents a part of Usage dependencies between features in the ECS product line. As dis- 
cussed earlier, if a feature depends on other features in terms of Usage dependency, 
this means that the former feature requires the latter features for its correct functioning 
or implementation. For example, Passenger Service in the ECS product line is a nor- 
mal driving service of an elevator. During the operation of Passenger Service, the 
elevator registers calls from passengers in the halls or within the car; answers the calls 
by moving the car to the requested floors; and controls (i.e., opens or closes) doors 
when it stops. Therefore, as shown in Fig. 7, Passenger Service requires Hall Call 
Registration, Car Call Registration, Start Control, etc. for its implementation. Start 
Control requires correct functioning of Direction Control, as it controls movement of 
the elevator toward the direction decided by Direction Control. Moreover, Direction 
Control requires correctness of Position Control, as it determines the direction for 
next movement based on the current position of the elevator. 

Modification dependency between two features means that the behavior of a fea- 
ture is changed by the other. As shown in Fig. 7, Load Weighing Bypass changes the 
behavior of Registered Floor Stop by ignoring the registered hall calls when a car is 
loaded to the predetermined level of capacity. Also, Car Call Cancellation changes 
the behavior of Car Call Registration by deleting registered calls. 
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Fig. 7. A part of Usage and Modification dependencies in the ECS product line 

In order to represent activation dependencies among features in an effective way, 
we use Statechart [7], The rounded rectangle represents the active state of a feature. 
As shown in the bottom box of Fig. 8, a feature nested by another feature represents 
that the activation of the former depends on that of the latter. Two features connected 
by an arrow means that two features are active sequentially. Two features nested by a 
third feature indicate that the two features cannot be active at the same time. If a line 
divides two features nested by a third feature, this represents that the two features can 
be active concurrently. With these notations, we can model the Subordinate- 
Activation, Concurrent-Activation, Exclusive-Activation, and Sequential-Activation 
dependencies among features effectively. 

As shown in the upper part of Fig. 8, Passenger Service, VIP Service, and Fire 
Fighter Seivice cannot be active at the same time. This means that an elevator in the 
ECS product line can provide only one of Passenger Service, VIP Service, and Fire 
Fighter Seivice at a time. While Passenger Service is active, Car Call Registration 
and Car Call Cancellation must not be active at the same time, and Anti-Nuisance can 
be active after the completion of Car Call Registration . In addition, Hall Call Regis- 
tration must be active concurrently with Car Call Registration, Car Call Cancella- 
tion, and Anti-Nuisance during the activation of Passenger Seivice. Moreover, Car- 
Call Cancellation, Anti-Nuisance, and Hall Call Registration can be active only when 
Passenger Seivice is active. They must not be active while VIP Service or Fire 
Fighter Service is active. 
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Fig. 8. A part of activation dependencies in the ECS product line 



5.3 Component Design of the ECS Product Line 

Product line assets must be commonly reusable for production of products and easily 
adaptable to product specific variations. In this section, we illustrate how the com- 
monality and variability analysis results and the dependency information can be used 
to design reusable and adaptable product line components for the ECS product line. 

As shown in Fig. 7, Passenger Service uses Car Call Registration, Anti-Nuisance, 
etc. for its implementation. As Anti-Nuisance is a variable feature, they must be hid- 
den from the component implementing Passenger Service so that their presence or 
absence will not affect the component for Passenger Service. The CarCallHandler 
component in Fig. 9 has responsibilities for hiding the presence or absence of Anti- 
Nuisance from the component for Passenger Service and delegating requests from the 
component for Passenger Service to the component that implements Anti-Nuisance. 

As Car Call Cancellation is a variable feature and modifies the behavior of Car 
Call Registration, the presence or absence of Car Call Cancellation affects the com- 
ponent for Car Call Registration. In order to make the component not changed for the 
presence or absence of Car Call Cancellation, the component for Car Call Cancella- 
tion (i.e., CanHandler ) extends the component for Car Call Registration (i.e., 
RegHandler) through the inheritance mechanism. 

In addition, as shown in Fig. 8, Car Call Registration and Car Call Cancellation 
must not be active at the same time, while Anti-Nuisance can be active after the com- 
pletion of Car Call Registration. Moreover, Anti-Nuisance and Car Call Cancellation 
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Fig. 9. Hiding variability and separating Modification dependency 



must not be active while VIP Service or Fire Fighter Service is active. Because of 
these activation dependencies between features, the presence or absence of variable 
features (e.g., Car Call Cancellation , VIP Service or Fire Fighter Service ) may affect 
other components. In order to handle this problem, we need to further separate activa- 
tion dependencies from components. As shown in Fig. 10, the activation manager 
encapsulates activation dependencies between features and coordinates activation of 
components implementing the features. 



6 Related Work 

The feature-oriented domain analysis (FODA) [9] was first introduced in 1990. Since 
then, many product line engineering methods have adopted or extended the feature- 
oriented commonality and variability analysis as an integral part of product line engi- 
neering. 

FeatuRSEB [6] extended RSEB (Reuse-Driven Software Engineering Business) 
[8], which is a reuse and object-oriented software engineering method based on the 
UML notations, with the feature model of FODA. The feature model of FeatuRSEB is 
used as a catalogue of or index to the commonality and variability captured in the 
RSEB models (e.g., use case and object models). Generative Programming [2] also 
uses the feature modeling as the key to the development and configuration of compo- 
nents for a product family. Although these methods adopted feature modeling to ana- 
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Fig. 10. Separating activation dependency 



lyze commonality and variability among products, they do not analyze dependencies 
among features explicitly. With these methods, dependencies among features are taken 
into account implicitly during object or component design. 

Recently, several attempts have been made to analyze and document dependencies 
among features explicitly. Ferber at el. [3] identified five types of feature dependen- 
cies or interactions (i.e., Intentional Interaction, Resource-Usage Interaction, Envi- 
ronment Induced Interactions, Usage Dependency, Excluded Dependency) in the 
engine control software domain and extended the feature model in terms of the de- 
pendency and interaction view for reengineering a legacy product line. Fey at el. [4] 
introduced another type of dependency, called “Modify” relation, in addition to “Re- 
quire” and “Conflict” relations, which are the same as Required and Excluded de- 
pendencies in the original FODA method. Although these work identified several 
dependencies or interactions between features, they did not present how feature de- 
pendencies have influences on product line asset development. The primary contribu- 
tion of this paper is to make explicit connection between a feature dependency analy- 
sis and product line asset design by showing how the analysis results can be used to 
design reusable and adaptable product line components. 



7 Conclusion 

The feature-oriented commonality and variability analysis method has been consid- 
ered an essential step for product line asset development. In the past, we applied the 
method to the core part of elevator control software for development of reusable and 
adaptable assets. We have experienced that we could easily add new control tech- 
niques and new hardware devices without major modification of the existing assets 
[13]. However, when we tried to incorporate various market specific services of ele- 
vators into the assets, we had difficulties in evolving the assets without localizing 
effects of service additions to one or a few components. This is mainly due to the fact 
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that many services of elevators are highly dependent on each other but the assets were 
designed without consideration of various dependencies among services. 

To address the difficulties, this paper has extended the method by including feature 
dependency analysis. In addition, we have made explicit connections between feature 
dependency analysis and product line component design. The explicit connections 
can help asset designers not only develop assets envisioned for a product line but also 
evolve the assets to support the future growth of the product line, as they can provide 
the rationale of design decisions made during asset development. Currently, we are 
reengineering the existing assets of elevator control software to validate the proposed 
method. 

In addition to feature dependencies, feature interactions [12] also have significant 
influences on product line asset development. When products are developed with 
integration of components implementing various features, these features may interact 
with each other in unexpected ways. Handling feature interactions may cause signifi- 
cant changes to product line assets, if interaction related code is scattered across many 
components. To improve reusability and adaptability of product line assets, they must 
be designed in a way that interaction related code can be confined to a small number 
of components. Therefore, we are currently extending the method proposed in this 
paper by also taking feature interaction as a key design driver for product line compo- 
nent development. 
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Abstract. To provide performance trade-offs to users, reusable component 
libraries typically include multiple implementation variants for each interface. 
This paper introduces a scalable notion of enhancements to extend interfaces 
with new features. Enhancements provide flexibility along two dimensions: 
They allow users to combine any set of features with the base interface and they 
allow any implementation variant of the base interface to be combined with any 
implementation variant of each feature. These two dimensions of flexibility are 
necessary for reusable libraries to remain scaleable. To address the feature 
flexibility problem, this paper introduces a general notion of enhancements that 
decouple feature implementations from the implementations of base interfaces. 
The paper explains an approach for realizing enhancements in standard Java and 
analyzes its benefits and limitations. It examines a simple mechanism to support 
enhancements directly in languages such as Java. 

Keywords: Components, interfaces, Java, maintenance, multiple implementa- 
tions, objects, patterns, and RESOLVE. 



1 Introduction 

Modern programming languages support separation of interfaces from implementa- 
tion details. This separation is necessary to develop multiple implementations for the 
same interface and provide users of the interface with performance tradeoffs. In 
general, there is no one best implementation for an interface. One implementation 
might be faster whereas another might make better use of storage. One might better in 
the average case whereas another might be more predictable. Given the choices, users 
will pick implementations that best fit their application requirements. The current 
Java component library, for example, contains a List interface and multiple 
implementation variants (array, linked list, vector) that provide different performance 
tradeoffs [Sun 03]. 

While the idea of developing multiple implementations of an interface is central in 
component-based software development, it also raises a scalability problem that arises 
when new features need to be added to existing interfaces. It is clear that even a well- 
designed component interface cannot support every feature every user might need. 
Therefore, it must be possible to extend interfaces with new features. The most 
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obvious way to add a feature, such as the ability to search a container component, is 
by adding a new operation to its interface. However, if an existing interface is 
modified to include a new feature, all existing implementation variants will need to be 
modified and the feature will have to be implemented for each variant [Biggerstaff 
94, Krueger 92]. This combinational explosion dramatically increases the cost of 
developing features faster than the value the feature adds. Biggerstaff concludes that 
the only satisfactory solutions involve layers of abstraction (LOA). While there are 
many variations on implementing an LOA, the key point is to avoid implementing a 
new feature more than once. The implementation of the feature must be such that it 
works with each of the existing implementations of the base interface. Doing so will 
lead to libraries that grow at a linear rate in terms of the number of features. 

In addition to the feature scalability problem noted above, there is at least one other 
complication not typically addressed in the literature. Additional features can be 
implemented in multiple ways to provide performance trade-offs. For example, a 
search feature for a tree can be done either depth first or breadth first, and use any of a 
variety of search algorithms. Implementation variants for features compound the 
combinational explosion problem. Feature addition must be implemented such that 
any feature implementation will work with any of the component’s base 
implementations . 

To provide, flexible feature selection we introduce the notion of enhancements and 
illustrate their use from a client’s prospective in Section 2. It is not possible to realize 
enhancements using single inheritance mechanisms in languages such as Java, 
because inheritance requires a specific ordering of features and makes it difficult for 
users to mix and match features. Therefore, we examine the use of decorator pattern 
[Gamma 95] and dynamic proxies to address this difficulty, and explain their 
limitations in Section 3. This discussion serves as the basis for the approach we 
propose to realize enhancements in standard Java in Section 4. The section contains a 
cost-benefit analysis. Section 5 examines a simple language feature to support 
enhancements of generic components directly using only single inheritance and 
considers other language design suggestions. It concludes with a summary. 



2 Introduction to Enhancements 

Figure 1 illustrates the basic idea of enhancements. The figure shows implementation 
alternatives for interfaces that provide base types and feature extensions. In the figure, 
circles represent interfaces. II is the base interface. El, E2, and E3 are interfaces for 
new features. The boxes represent implementation variants for the base interface and 
features. Shaded items show a particular combination a client might choose to 
compose. This paper explains how to make or modify such choices with only local 
changes to only one file, regardless of the number of implementation variants, 
minimizing software development and maintenance costs. The approach requires and 
takes advantage of the fact that implementations of feature extensions are decoupled 
from the implementations of the base type. 
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Fig. 1. Alternative Implementation Variants for Interfaces and Features 

For a concrete example, suppose that a client needs to extend a Stack component 
interface that provides basic stack operations, such as, push, pop, and empty. In 
particular, suppose that peek and search are the features (or operations) to be added 
and that PeekCapability and SearchCapability are the interfaces that 
define these two features, respectively. If the client wants to choose the 
implementations Arraylmpl for Stack interface, Peeklmpll for 
PeekCapability, and Searchlmpl for SearchCapability, then using the 
code pattern proposed in Section 4 of this paper, the following declaration can be 
used: 

Stack myStack = LinSearchlmpl . addEnhancement ( 

Peeklmpll . addEnhancement ( 
new Arraylmpl ( ) 

) 

) ; 

The factory method addEnhancement is used as a wrapper as explained in 
Section 4. This example clarifies that a client needs to make the implementation 
choices only when an object is created. This flexibility allows implementations to be 
changed, at a single location in the code. For example, to switch the implementations 
in myStack declaration, only a local modification is necessary as shown below: 

Stack myStack = BinSearchlmp . addEnhancement ( 

Peeklmp2 . addEnhancement ( 
new Vectorlmpl ( ) 

) 

) ; 
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Once a Stack with appropriate features is created, the operations of the base type 
stack (push, pop, empty) can be called directly as in the call below: 
myStack . push ( " abc " ) ; 

In general, a cast is needed to call an enhanced operation, though it can be avoided 
if only one enhancement is added: 

( ( SearchCapability ) myStack) . search () ; 



3 Limitations of Decorator Pattern and Dynamic Proxies 

This section explains the technical the limitations of two well-known patterns to 
realize enhancements. It is important to understand these alternatives because the new 
code pattern we propose in Section 4 employs a combination of these patterns. 



3.1 Limitations of the Decorator Pattern to Solve the Problem 

Of the well-known design patterns [Gamma 95], the decorator pattern best fits the 
problem. The objectives of the pattern are to “attach additional responsibilities to an 
object dynamically” and to “provide a flexible alternative to subclassing for extending 
functionality.” [Gamma 95]. With decorators, optional and possibly multiple 
extensions to a single base type are provided. Different implementations can be 
selected for the base interface and decorators by instantiating the right class. Given 
below is an example that creates an object with two enhancements using a decorator 
pattern. 

Stack myStack = new LinSearchIm.pl ( 

new Peeklmpll ( 

new Arraylmpl ( ) 

) 

) ; 

While this pattern seems to be a solution, it has a serious limitation: the decorators 
and the base class must have the same interface. There is no way to extend the 
interface at runtime. The problem is that the outermost enhancement is aware of its 
methods and the base type methods, but it cannot know the methods of other 
enhancements. Therefore, when a method call belonging to another enhancement is 
made, there is no code to handle it. In most typed languages, including Java, such a 
call will result in a compile error. Using a cast only postpones the error until run-time. 
However, a client can mix and match implementations using the decorator pattern by 
instantiating different implementation classes, and we will take advantage of this 
characteristic of the decorator pattern in developing a new pattern. 



3.2 Limitations of Dynamic Proxies to Solve the Problem 

Another solution to the feature selection problem is to use the dynamic proxy 
mechanism, introduced in Java 1.3 [Sun 99]. Normally, a proxy provides a stand-in 
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object that has the same operations as the object for which it is a replacement. 
Indeed, Gamma, et al. note that “a proxy provides the same interface as its subject” 
[Gamma 95]. However, Sun’s language support for the proxy pattern contained two 
features that provide much of the flexibility needed to achieve our goal: 

1. Sun’s factory method for dynamic proxy takes an array of interfaces instead of 
a single interface. Using reflection in Java [Sun 03] - a feature that allows a 
program to examine itself at runtime - it becomes possible to specify at runtime 
which interfaces should be included. 

2. All calls on the proxy are routed to the same method - invoke. This provides 
a single point from which to forward non-local method calls. 

Given below is an example of how to create a proxy object for base type with two 
feature extensions: 

Class [] mylnterfaces = 

{BaseType . class , Enhl . class , Enh2 . class } ; 

BaseType myObj = (BaseType) Proxy. newProxylnstance ( 

BaseType . class . getClassLoader ( ) , 
mylnterfaces, implementation); 

This code creates an array of interfaces, one interface for the base type, and one 
each for each enhancement. Then the factory method for dynamic proxy is called. 
The call uses the current class loader, the array of interfaces created in the first line of 
code, and an implementation object. It returns a proxy object. While dynamic proxies 
make it possible to deal with multiple interfaces, there remains a problem: a dynamic 
proxy allows only a single implementation object that must implement all the 
interfaces specified. To allow flexible feature selection we would like to be able to 
provide a separate implementation for each feature. The dynamic proxy does not 
provide a means to combine independently written implementations of code. 

In summary, the decorator pattern allows a varying number of implementations to 
be combined into a single object. The dynamic proxy allows multiple interfaces to be 
combined into a single interface. The dynamic proxy also provides a single point 
through which all calls are processed. 



4 Realizing Enhancements in Standard Java 

While neither the decorator pattern, nor the dynamic proxy by itself can be used to 
tackle the feature selection problem, they provide complementary pieces necessary to 
a solution as explained in this section. We assume that each component in a library 
has a base interface, implementation variants, feature enhancement interfaces, and 
implementation variants for each enhancement interface. In addition, to use the 
proposed approach, an abstract class that captures code common to all enhancements 
of a particular base interface as discussed in Section 4.2 is required. 
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4.1 Example Interfaces and Implementations 

We begin with an (unsurprising) stack base interface that has fundamental Stack 
operations as shown below. 

public interface Stack { 

void push(Object o) ; 

Obj ect pop ( ) ; 
boolean empty ( ) ; 

} 

The library may include several classes that implement the same Stack interface, 
such as Array Impl, Listlmpl, or Vectorlmpl. An outline of Arraylmpl is 
shown below: 

public class Arraylmpl implements Stack { 

public Arraylmpl ( ) { . . . } 

public void push(Object o) {. . .} 

public Object pop ( ) {. . .} 



} 

When using enhancements, the interfaces of components should provide only a 
minimal set of operations, because it is easy to add additional operations. The library 
will contain several features to extend the Stack interface, such as peek and search 
among others. Shown below is an interface to add the peek operation: 

public interface PeekCapability extends Stack { 

Obj ect peek ( ) ; 

} 

The first difference in implementation with enhancements arises in implementing a 
feature, such as peek. A class that implements the extended interface must follow 
the composition technique shown below. 

public class Peeklmpl extends StackEnhancement 
implements PeekCapability { 

public Peeklmpl ( Stack baselnterf ace ) { 

super (baselnterf ace ) ; 
setup ( ) ; 

} 

public static Stack addEnhancement ( Stack toWrap) { 
return StackEnhancement . addEnhancement ( 

new Peeklmpl (toWrap) ) ; 



} 
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public Object peek ( ) { 

Object o = pop ( ) ; 
push (o) ; 
return o; 

} 

} 

Peeklmpl extends StackEnhancement (which in turn implements Stack 
interface); therefore, Peeklmpl must provide code for all operations in the Stack 
interface as well as peek. Peeklmpl provides the code for peek directly and uses 
StackEnhancement to implement Stack operations. StackEnhancement is 
an abstract class that contains the bulk of the code needed to make the component 
composable. Classes such as Peeklmpl that implement new features must provide a 
constructor, an addEnhancement method, and code for the added operation(s). 

The constructor takes an object that provides the base interface’s type. Since an 
enhancement interface extends the base interface, either the base or an enhancement 
may be passed, this allows us to add any number of enhancements. The constructor 
passes this object to the parent’s constructor. The need to call setup, defined in 
StackEnhancement, is explained in section 4.2. 

The addEnhancement method acts as a factory, producing the completed object 
for the client, by combining a call to the enhancement’s constructor with a call to 
addEnhancement in the StackEnhancement class. This method is the same 
for all enhancements, except for the constructor used. 

We conclude the discussion noting that there is nothing special about the code for 
peek itself. The only aspect of is importance is that Peeklmpl must accomplish its 
work only through the public interface provided by Stack. 



4.2 Crux of the New Code Pattern 

Enhancements use an abstract class for each base interface which contains common 
enhancement code. Some of this code is necessary to implement the base interface in 
addition to new features. The rest of the code is needed to make enhancements 
composable. This abstract class is dependent only on the base interface, not on 
optional features or implementations. 

Figure 2 shows a UML class diagram for a single enhancement to illustrate the 
general idea. Method calls to the enhancement are made against the dynamic proxy, 
which validates the call against the array of interfaces that it stores. After a successful 
validation the dynamic proxy will call the invoke method of the enhancement 
implementation and pass it the method that was called. The invoke method will 
decide if the method is local, calling the local method if it is, or pass responsibility for 
the call on by calling the method on the next enhancement or base object in the chain. 
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Base Interface Enhancement Interface 



Fig. 2. A UML diagram showing the structure of a single enhancement 

The code pattern for the example abstract class StackEnhancement for the 
Stack interface is shown below. 

import j ava . lang . ref lect . Proxy ; 

import j ava . lang .reflect . InvocationHandler; 

import j ava . lang . reflect . Method; 

abstract public class StackEnhancement 
implements Stack, InvocationHandler { 

private Stack baselnterf ace ; 
protected Method!] methods; 

protected StackEnhancement ( Stack baselnterf ace) { 
this . baselnterf ace = baselnterf ace ; 

} 

public void push (Object o) { 
baselnterf ace .push (o) ; 

} 

public Object pop ( ) { 

return baselnterf ace . pop () ; 

} 

protected void setup ( ) { 

Class[] interfaces = getClass ( ) . getlnterf aces ( ) ; 
methods = interfaces [ 0 ]. getMethods () ; 

} 



public Object invoke (Obj ect proxy. 
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Method method. 

Object [] args) throws Throwable{ 
for(int i = 0; i < methods . length; i++) 
if ( (methods [ i ] ) . equals (method) ) 
return method . invoke ( this , args); 
return method . invoke (baselnterf ace , args); 

} 

public static Stack addEnhancement ( 

StackEnhancement eObj ) { 

Stack toWrap = eObj . baselnterf ace ; 

Class [] toWrapInterf aces = 

toWrap . getClass ( ) . getlnterf aces ( ) ; 
Class [] thislnterf aces = new 

Class [ toWrapInterf aces . length+1 ] ; 
Class [] tmplnterf aces = 

eObj . getClass ( ) . getlnterf aces ( ) ; 
thislnterf aces [ 0 ] = tmplnterf aces [ 0 ] ; 

System. arraycopy ( toWrapInterf aces , 0 , 
thislnterf aces , 1, 
toWrapInterf aces . length) ; 
return (Stack) ( Proxy . newProxylnstance ( 

Stack .class . getClassLoader ( ) , 
thislnterf aces , eObj ) ) ; 

} 

) 

StackEnhancement implements the base interface Stack and therefore, an 
enhancement may be substituted for the base. The class has a simple wrapper method 
for each base operation. The wrapped methods forward calls to the corresponding 
base methods. Two other tasks make extensions composable. The first of these is to 
collect all the interfaces, base and enhanced, that belong to the object under 
construction. The second is to distinguish calls that have local implementations from 
those that should be forwarded. 

The combination of decorator pattern and dynamic proxies is used to accomplish 
these tasks. First, a compound object is built up through the constructor in a manner 
similar to the decorator pattern. As with the decorator, a call is either handled locally 
or passed through the reference established by the constructor to the next object. A 
call on the compound object passes through the invoke method, provided by the 
dynamic proxy. Going through the method does not negate Java’s type checking, 
either at compile or runtime. To avoid type errors, the object must include all the 
interfaces, base and enhancement, that the client has composed. At the time the 
common enhancement code is written, the specific optional features and 
implementation variants chosen by particular client will not be known. Therefore, a 
general mechanism is necessary to build a list of methods at runtime. 

Composing an object builds it from the inside out. New layers are added by calling 
addEnhancement. The innermost object is simply a base object, which is unaware 
of this mechanism. The enhancement object includes all the interfaces of the object 
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passed to it, then adds its own interface. Since each enhancement works this way, 
addEnhancement needs only the list of interfaces from outermost layer of the 
object passed. Interfaces are retrieved using the getlnterf aces ( ) method on the 
object’s related Class object. This method is called on the current object and the 
object passed in. The addEnhancement method builds a new array that holds both 
sets of interfaces. This array of interfaces is used by newProxy Instance to create 
the object that is returned. 

To improve performance, the code that determines if a method is local is split into 
two pieces. First, a list of the local methods is developed. This cannot be done 
StackEnhancement constructor, because the list relies on the this pointer which 
is not available until parent constructors have finished. Instead, the list is created in 
the setup method that must be called by the enhancement’s constructor. Calling 
setup in the constructor means it is called just once, when the object is created. 

All method calls on an object implementing the Proxy interface are dispatched 
through invoke, which determines if the method is implemented locally or needs to 
be forwarded to a more inner object. If the method is local, thus cached by setup, 
invoke is called on the this pointer; otherwise, invoke is called on the 
baselnterf ace pointer. 



4.3 A Cost Benefit Analysis 

The proposed approach to realize enhancements in Java modularizes feature addition, 
and therefore, improves maintainability and flexibility. This is illustrated in Table 1. 
In the table, “Monolithic” column refers to a non-modular approach of using a single 
interface and class to bundle the base type and all necessary features. The “No 
Interface” approach is the same as “Monolithic” except that no interface is defined or 
used. 

To evaluate the maintenance cost, suppose that a base type has N implementation 
variants. To add 1 new feature, code pattern using enhancements requires only the 
interface and implementation of the new feature to be developed. All other 
approaches require modification of all N base variants. To add M implementations of 
a new feature are added, the enhancement approach affects only those M 
implementations. The cost of all other approaches, except the Decorator pattern, 
grows unacceptably in proportion to M * N. Clearly, the number of implementation 
variants for the base type affects the cost to add a new feature. Booch has reported 
finding 26 meaningful implementation variants for classical abstract data type 
interfaces [Booch 87]. 

To gain the runtime flexibility needed to support composable libraries, 
enhancements use more indirection. To measure the cost, we did an experimentation 
using a combination of calls to base operations and enhancement operations. Table 2 
shows the time in seconds for each call. The first column is the ratio between call 
overhead vs. computation time. These results are based on repeating a ten-operation 
loop 100,000,000 times, and finding the average time for each call. The test was run 
on a Pentium III running at 867 MHz with 382M memory using the Solaris OS and 
JDK 1.4.2. 
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Table 1. Software development/maintenance cost comparison to to add a new feature 





Enhance- 

ment 


Dynamic 

Proxy 


Decorator 


Monolithic 


No 

Interface 


Feature addition: 
1 interface and 1 
implementation 


1 + 1=2 


N + 1 


N + 2 


N + 1 


N 


Feature addition: 
1 interface and M 
implementations 


M + 1 


M * N+ 1 


M + N+ 1 


M * N+ 1 


M * N 



Table 2. Performance comparison between enhancement and non-composable approaches 



Ratio of 
Call OH to 
comp, time 


Enhance- 

ment 


Dynamic 

Proxy 


Decorator 


Monolithic 


No Interface 


2:1 


0.0000336 


0.00001905 


0.0000117 


0.0000114 


0.00001122 


1:1 


0.0000448 


0.00003025 


0.0000229 


0.0000226 


0.00002242 


1:2 


0.0000672 


0.00005265 


0.0000453 


0.000045 


0.00004482 



In the worst case, the time per method call with enhancements is about 3 times as 
long as what it takes in the most non-modular approach of using no interfaces at all as 
seen from the first row. As the computation time increases, the performance of 
enhancements compares favorably. For example, the third row shows that it takes 
only 1 .5 times more than the decorator pattern to provide the benefit of composability. 
This is still a small price to pay in absolute terms and is likely to be less because call 
overhead is unlikely to constitute a significant proportion of an actual application. 
The table also documents the cost of interfaces and use of dynamic proxies in Java. 
As seen from the last two columns, just adding an interface causes a performance 
degradation. In general, operations that are frequently used, particularly those that 
can get a large performance improvement from having access to the object’s internal 
state, should go in the base type. The library designer must consider the tradeoffs 
between increased maintenance costs from including a large number of operations 
into the base interface or the potential increase in performance costs that arise from 
placing them in enhancements. 



5 Discussion 

The cost of developing and using enhancements in languages such as Java can be 
mitigated through direct language support. In this section, we examine related work. 



5.1 RESOLVE Enhancements - A Static Feature Extension Mechanism 

The work most directly related to the present paper is the notion of enhancements 
(based on single inheritance) in RESOLVE [Sitaraman 93, Sitaraman 94]. We 
summarize its simple mechanism for supporting flexible feature selection in this 
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section. The overall goal is to be able to write several secondary operations to extend 
the set of primary operations provided by the base concept, as depicted in Figure 1. 
For an example, suppose that StackTemplate is an interface parameterized to be 
generic (e.g., by the type of entries in the Stack). Shown below is the skeleton of a 
ReversalCapability interface specification. 

Enhancement ReversalCapability for StackTemplate; 
Operation Reverse ... 

// specification to reverse a Stack 

end ReversalCapablity ; 

The enhancement implementation shown below can not access the data 
representation of a Stack. It needs to be written using only Stack primary operations. 
In addition to enabling modular software development, the decoupling eases 
reasoning about component behavior [Sitaraman 00]. 

Realization LocalStacklmpl for 

StackTemplate . ReversalCapability; 

Procedure Reverse ... 

// code using only primary Stack operations 
end Reverse; 

end LocalStacklmpl; 

To compose a package that provides Stack of information trees and necessary 
Stack secondary operations, a facility (or instantiation) declaration such as the one 
shown below can be used, where Info is some arbitrary type: 

Facility InfoStackFac is StackTemplate ( Inf o ) 
realized by Arraylmpl 
enhanced by ReversalCapabili ty 

realized by LocalStacklmpl 
enhanced by PeekCapability 

realized by Peeklmpll ( Inf oCopyOp) ; 

In the instantiation above, the client has chosen to make suitable feature and 
implementation choices. RESOLVE allows interfaces and implementations of base 
concepts and enhancements to be parameterized independently. To implement the 
PeekCapability for a Stack containing arbitrary Info type entries, it is necessary 
to have a user-supplied operation to copy entries of Info type. The user who knows 
the type Info must provide this operation as a parameter (only) to the Peeklmpl that 
needs to use it. The mechanism discussed here permits feature additions only 
statically, and if this is done then, the overhead of the dynamic proxy mechanism may 
be avoided. 

Using RESOLVE principles as their basis, Weide at al, have shown how to 
facilitate multiple implementations of interfaces and flexible feature selection in 
standard C++ using a combination of (multiple) inheritance and templates [Weide 
00]. Other benefits of developing software using the RESOL VE/C++ discipline 
including modular reasoning are discussed in [Hollingsworth 00]. Sridhar, Weide, and 
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Bucci discuss a related pattern termed “service facilities” to decouple implementation 
to implementation dependencies in [Sridhar 02] using Java language as an example. 



5.2 Related Work 

For feature enhancement, one of the best known approaches is the idea of mixins, 
which allow a class that provides some functionality, such as persistence, to be 
combined or mixed into an existing class. Mixins have been used in a number 
languages including CLOS [Bobrow 87], Flavors [Moon 86], and C++ [Stroustrup 
97], Smaragdakis has noted that use of mixins in C++ can be simplified using mixin 
layers [Smaragdakis 02a]. The mixin layer approach facilitates feature composability. 
It modifies the mixin class by making its parent a parameter, which allows the child to 
be used with different parent classes. 

Mixins rely on multiple inheritance which is not supported in languages such as 
Java. The complexity of multiple inheritance is arguably one of the reasons for new 
language features [Black 03, Flendler 86]. Noting the difficulty of mixins, Hendler 
recommends adding a new language level construct that he also terms 
“enhancements” in [Hendler 86] to Flavors to allow functionality to be added to a 
class independently of its class hierarchy though Flavors supports multiple 
inheritance. Hendler’ s enhancements, set in the context of a Lisp-like language, allow 
functionality to be added to individual objects rather then classes. They are not based 
on interfaces, and therefore, are quite different from ours. To support feature 
extension while using single inheritance in SmallTalk, a language construct called a 
trait has been proposed in [Black 03]. Traits allow methods, and their 
implementations, to be added to a class independently of the class hierarchy. As with 
our approach the methods in a trait may not access the internals of the class they are 
composed with. Smalltalk does not provide a clean separation between interfaces and 
implementations, and hence, the notion of enhancements is different from traits. 

A variation of Java based on mixins has been proposed in [Ancona 00], and this 
solution requires modifying the language and compiler. To support feature 
composability in Java MultiJava and its descendents such as Relaxed MultiJava 
[Millstein 03] include new mechanisms. 

Feature support through new language design is common in the generator 
community [Czarnecki 99] including Draco [Neighbors 89] and GenVoca [Batory 
97], Combing Java fragments using a generator is the basis of Batory’s current 
implementation of Feature Oriented Programming (FOP) [Batory 03]. 



5.3 Summary 

For library scalability, it is essential to be able to add new features to existing base 
interfaces. In general, base interfaces and features will have multiple implementations 
to provide various performance trade-offs. A user should be able to mix and match 
features and implementation variants, and switch the implementation variants with 
minimal maintenance cost. To provide this flexibility, we have explained a notion of 
enhancements. We have discussed an approach to realize the idea in standard Java. 
As a direction for further exploration, we have outlined a simple mechanism to 
facilitate feature composition directly and more efficiently in languages such as Java. 
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Abstract. This paper describes a feature modelling technique aimed at 
modelling the software assets behind a product family. The proposed technique 
is distinctive in five respects. First, it proposes a feature meta-model that avoids 
the need for model normalization found in other meta-models. Second, it uses 
an XSL-based mechanism to express complex composition rules for the 
features. Third, it offers a means to decompose large feature diagrams into 
extensible and self-contained modules. Fourth, it defines an XML-based 
approach to expressing the feature models that offers a low-cost path to the 
development of support tools for building the models. Fifth, it explicitly 
supports both the modelling of the product family and of the applications 
instantiated from it. The paper presents the feature modelling technique in the 
context of an on-going project to build a generative environment for family 
instantiation. The experience from two cases studies is also discussed. 



1 Motivation 



Our research group is concerned with software reuse in embedded applications [1], 
The context within which we work is that of software product families understood as 
sets of applications that can be built from a pool of shared software assets. The 
problem we are addressing is that of automating the instantiation process of a product 
family. The instantiation process is the process whereby the generic assets provided 
by the family are configured to build a particular application within the family 
domain. Our ultimate goal is the creation of a generative environment for family 
instantiation as shown in figure 1. The environment provides a family-independent 
infrastructure, it is customized with the family to be instantiated, and it can 
automatically translate a specification of an application in the family domain into a 
configuration of the family assets that implements it. 

To be of practical use, such an environment must be generic in the sense of being 
able to support different families. It must, in other words, be built upon a family meta- 
model rather than upon a particular family. This avoids the cost of having to develop 
a dedicated generative environment for each target family. The environment must 
consequently be configurable with a model of the target family. Such a model is 
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required for two purposes: (1) to parameterize the generative environment with the 
family to be instantiated, and (2) to serve as a basis for specifying the application to 
be instantiated. The family model must therefore contain all the information needed to 
configure the family assets to instantiate a target application and it must allow the 
target application to be specified. The latter can also be expressed by saying that the 
family model must be the basis upon which to build a domain specific language 
(DSL) for the family domain. 

This paper presents our approach to modelling a software product family and to 
building a DSL upon it. Our approach is based on feature modelling techniques and is 
distinctive in five respects. First, it defines a feature meta-model that avoids the need 
for normalization found in other meta-models. Second, it offers an XSL-based 
mechanism to express complex composition rules for the features. Third, it offers a 
means to decompose a large feature diagram into extensible and self-contained 
modules. Fourth, it defines an XML-based approach to expressing the feature models 
that offers a low-cost path to the development of a support tool for building the 
models. Fifth, it explicitly supports both the modelling of the product family and of 
the applications instantiated from it. These contributions are discussed in sections 3 
and 4. Section 5 presents our experience from two case studies. 



Generative Environment 



Application 

Specification 



f Family Model \ 




Family-Independent Infrastructure 






Application 

Implementation 



Family Software Assets y 



Fig. 1 . Generative environment for family instantiation 



2 Feature Models 

In general, a feature model [3, 4, 5, 6, 7, 8] is a description of the relevant 
characteristics of some entity of interest. The most common representation of feature 
models is through FODA-style feature diagrams [3, 4, 5]. A feature diagram is a tree- 
like structure where each node represents a feature and each feature may be described 
by a set of sub-features represented as children nodes. Various conventions have been 
evolved to distinguish between mandatory features (features that must appear in all 
applications instantiated from the family) and optional features (features that are 
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present only in some family instances). Limited facilities are also available to express 
simple constraints on the legal combinations of features. 

Figure 2 shows an example of feature diagram for a family representing (much 
simplified) control systems. The diagram states that all control systems in the family 
have a single processor, which is characterized by its internal memory size, and have 
one to four sensors and one or more actuators. Sensors and actuators may have a self- 
test facility (optional feature). Sensors are either speed or position sensors whereas 
actuators can only be position actuators. 

Feature models have traditionally been used to specify the domain of a product 
family. Our usage of feature models is similar but our concern is less the a priori 
specification of a product family than the a posteriori description of the pool of 
software assets that are used to build applications within its domain. Furthermore, the 
emphasis in traditional feature modelling is on product family modelling whereas we 
wish to explicitly support the modelling of the applications instantiated from the 
family. 




Fig. 2. Feature model example 



3 Proposed Modelling Approach 

Feature modelling approaches are usually based on a two-layer structure with a meta- 
modelling level, which defines the types of features that can be used and their 
properties and mutual relationships, and a modelling level where the feature model for 
the entities of interest is constructed. This is adequate when the objective is only to 
model the domain of a product family. In the context of the generative environment of 
figure 1, however, there is a need to model both the family (to parameterize the 
generative environment) and the applications instantiated from it (to express the 
specifications that drive the generation process). Hence, our modelling approach 
explicitly recognizes three levels of modelling: 

- Family Meta-Modelling Level 

- Family Modelling Level 

- Application Modelling Level 
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At family meta-modelling level, the facilities available to describe a family are 
defined. The family meta-model is fixed and family-independent. We describe our 
family meta-model in greater detail in section 4. At family modelling level, a 
particular family is described. A family model must be an instance of the family meta- 
model. Finally, at application modelling level, a particular application is described. 
The application model serves as a specification of the application to be instantiated 
from the family. The application model must be an instance of the family model. In 
this sense, the family model can be seen as providing a DSL for describing 
applications in its domain. 

We represent both the family model and the application model as feature models. 
The former describes the mandatory and optional features that may appear in 
applications instantiated from the family. The latter describes the actual features that 
do appear in a particular application. We see the application model as a feature model 
where all features are mandatory and where all variability has been removed. 

Since we treat both the family and the application models as feature models, it 
would in principle be possible to derive them both from a unique feature meta-model. 
However, the characteristics of these two models are rather different and we found 
that it is best to instantiate them from two distinct meta-models. 




Fig. 3. Feature modelling architecture 



Feature models require the definition of a concrete syntax to express them and the 
availability of a tool to support their definition. Several choices are possible. Some 
authors define their own syntax and create their own tool to support it [7, 14]. Other 
authors have proposed UML-based formalisms [8, 15] in order to take advantage of 
existing UML tools. More recently, meta-modelling environments like EMF [10] or 
GME [11, 18] have become available and this was the road that we tried initially 
using a modelling architecture as shown in figure 3. However, we eventually 
abandoned this approach because the top-level meta-model imposed by EMF or GME 
was not sufficiently constraining to express all the aspects of the family and 
application concepts that we wished to incorporate in our models. In particular, we 
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found it impossible (at least using the default implementation of the tools) to express 
and enforce constraints on group cardinalities (see section 4). 

The alternative route we took, uses XML-languages to express the feature models 
and XML schemas to express the meta-models. The relationship of instantiation 
between a model and its meta-model is then expressed by saying that the XML-based 
model must be validated by the XML schema that represents its meta-model. The 
resulting modelling architecture is shown in figure 4. Both the family and the 
application models are expressed as XML-based feature models but they are 
instantiated from two different meta-models that take the form of XML schemas. The 
application meta-model is automatically generated from the family model by an XSL 
program (the Application XSD Generator). This is better than having a unique feature 
meta-model since it allows the application meta-model to be finely tuned to the needs 
of each family. The degree of fine-tuning can be roughly quantified by noting that, in 
the case studies described in section 5, the number of elements in the (automatically 
generated) application meta-model is between one and two orders of magnitude larger 
than the number of elements in the family meta-model. 




Fig. 4. XML-based feature modelling architecture 



The advantage of the EMF or GME approach is that compliance with their meta- 
meta-model permits use of a standard environment for the construction of the models. 
Our choice, however, has a similar advantage because it lets us use standard XML 
editing tools to express feature models and to enforce the relationship of instantiation 
between models and their meta-models (most XML tools can automatically enforce 
compliance with a user-defined XML schema). Our approach has the additional 
advantage that XSD (the XML-based language used to express XML schemas) allows 
definition of more expressive meta-models than the GME or EMF meta-meta-models. 

Note also that, since an XML schema defines an XML-based language, the 
application meta-model can be seen as defining the DSL that application designers 
must use to specify the applications they wish to instantiate from the family. 
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3.1 Feature Composition Rules 

A complete model of a family consists of the list of all the features that may appear in 
applications instantiated from the family, together with a list of the composition rules 
that define the legal combinations of features that may appear in an application. A 
model of an application consists of a list of the features that appear in the application. 
The application represents an instantiation of the family if: (1) its features are a subset 
of the features defined by the family, and (2) its features comply with the composition 
rules defined at family level. Consequently, a complete feature modelling approach 
must offer the means to specify both features and their composition rules. Current 
feature modelling techniques are often weak in the latter respect. They are normally 
well-equipped to express local composition constraints, namely constraints on the 
combinations of sub-features that are children of the same feature. Thus, for instance, 
it is easy to express a constraint that a certain feature can only have one sub-feature or 
that it can only have one sub-feature selected out of two possible options. It is instead 
harder to express global composition constraints, namely constraints based on 
relationships between non-contiguous features in different parts of the feature 
diagram. The FODA notation covered require-exclude relationships (expressing the 
condition that the presence of a certain feature is incompatible with, or requires, the 
presence of another feature in a different part of the feature diagram). With the 
exception of the Consul approach [19], however, more complex kinds of global 
composition constraints are not covered. 

At first sight, the approach we sketched in the previous section suffers from similar 
limitations. We use XML documents to express both the family and the application 
models and we automatically derive an XML schema from the family model. The 
relationship of instantiation between the family model and the application model is 
then expressed by saying that the XML document representing the application model 
must be validated by this XML schema. This approach has the virtue of simplicity but 
is limited by the expressive power of an XML schema which allows only 
comparatively simple composition rules to be expressed. In practice, it is again only 
local composition constraints that can be easily expressed. 

The use of an XML-language to express the application model, however, opens the 
way to more sophisticated approaches to expressing global composition constraints. 
Arguably, the most obvious and the most flexible way to do so is to encode general 
constraints as XSL programs that are run on the XML-based model of the application 
and produce an outcome of either “constraint satisfied” or “constraint not satisfied”. 
Such an approach is powerful but has two drawbacks. Firstly, it requires the family 
designer to be proficient in XSL. Secondly, it introduces a dichotomy in the 
modelling approach at family level since a family model would then consists of an 
XML-based feature diagram and an XSL-based constraint-checking program. 

In order to avoid these drawbacks while still exploiting the power of XSL to 
express general feature composition constraints, we selected an alternative approach 
where the composition constraints are themselves expressed through a feature 
diagram of the same kind that is used to model a family. A compiler is then provided 
that translates the constraint model expressed as a feature diagram into an XSL 
program that checks compliance with the constraints at application model level. This 
is illustrated in figure 5. The starting point is the constraint family model. This is a 
family model in the sense that it is an instance of the family meta-model but its 
intention is not, as in the case of “normal” family models, to describe a set of related 
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applications but rather to describe a set of constraint models where each constraint 
model describes the global composition constraints that apply to one particular 
family. The Application XSD Generator (see figure 4) is used to construct a 
constraint meta-model from the constraint family model. The model of the global 
composition constraints for a certain family is derived by instantiating this constraint 
meta-model. The resulting constraint model is then compiled by the constraint model 
compiler to construct the application constraint checker. This is the XSL program 
that must be run on an application model (expressed as an XML-based feature 
diagram) to verify that the application complies with all its composition constraints. 



Family 

Meta-Model 



Application 
XSD Generator 



Feature Meta-Model 
(XML Schema) 



Constraint 

Family 

Model 



i| / 

XML-B«*d Constraint 

Feature Diagram Meta-Model 



Constraint 

Model 




Fig. 5. Generation of Application Constraint Checker 



In summary, a family is characterized by two types of models: 

- A family model (see figure 4) that describes the mandatory and potential features of 
applications within the family domain together with their local composition 
constraints, 

- A constraint model (see figure 5) that describes the global composition constraints 
on the family features. 

The latter model is a type of application model and can therefore be constructed in the 
same environment in which application designers build the models of their 
applications. The definition of the global composition constraints is thus embedded in 
the same environment as the definition of the family and application model. 



3.2 Feature Macros 

We have introduced a mechanism to split a large feature diagram into smaller 
modules that can be used independently of each other. These modules are called 
feature macros. Feature macros resemble the macro facilities provided by some 
programming languages to encapsulate segments of code that can then be "rolled out” 
at several places in the same program. A feature macro represents a part of a feature 
diagram consisting of a node together with all its sub-nodes. The family meta-model 
allows a feature macro to be used wherever a feature can be used. A large feature 
diagram can thus be constructed as a set of independently developed modules. Note 
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that the same feature macro can be used at different points in a feature diagram (see 
figure 6 for an example). Feature macros thus provide both modularity and reuse. 

The feature macro mechanism is similar to the module concept of [7] but it goes 
beyond it because, in order to enhance reuse potential, we have added to it an 
inheritance-like extension mechanism. Given a feature macro B (for “Base”), a 
second feature macro D (for “Derived”) can be defined that extends B. Feature macro 
D is an extension of feature macro B in the sense that it adds new features to the 
feature sub-tree defined by B. This allows a form of reuse where a feature sub-tree 
can be parameterized and instantiated for use in different parts of the same feature 
diagram with different requirements. This is more flexible than just allowing a feature 
sub-tree to be used in the same form at different places in a feature diagram. Finally, 
note that the analogy with inheritance is only partial because we do no offer the 
possibility of overriding existing sub-features in a base feature sub-tree: new sub- 
features can be added but existing ones cannot be deleted or modified. 
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Fig. 6. Feature macro example 



4 Family Meta-level 

There are two items at the family meta-level: the family meta-model and the 
application XSD generator (see figure 4). They are described in the next two sub- 
sections. Additionally, section 4.3 describes the constraint family model (see figure 5) 
that serves as the basis for the expression of global composition constraints. 
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4.1 Family Meta-model 

Figure 6 shows our family meta-model. The basic ideas behind it can be summarized 
as follows. A feature can have sub-features but the connection between a feature and 
its sub-features is mediated by the group. A group gathers together a set of features 
that are children features of some other feature and that are subject to a local 
composition constraints (see section 3.1). Thus, a group represents a cluster of 
features that are children of the same feature and that obey some constraint on their 
legal combination. The same feature can have several groups attached to it. Both 
features and groups have cardinalities. The cardinality of a feature defines the number 
of instances of the feature that can appear in an application. The cardinality of a group 
defines the number of features chosen from within the group that can be instantiated 
in an application. Cardinalities can be expressed either as fixed values or as ranges of 
values. The distinction between group cardinality (the number of distinct features 
within the group that can be instantiated in an application) and feature cardinality (the 
number of times the feature can be instantiated in an application) avoids the 
multiplicity of representation and the consequent need for normalization found in the 
feature meta-models proposed by other authors [5, 7, 16, 17]. 
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Fig. 7. Family meta-model 



The meta-model of figure 7 also covers the feature macro mechanism (section 3.2). It 
allows a feature macro to be used wherever a feature can be used and defines two 
mechanisms for extending feature macros: (1) a derived feature macro can add a new 
group to its base feature macro, and (2) a derived feature macro can add a new feature 
to one of the groups of its base feature macro. 

Finally, our feature meta-model attaches a type element to each feature. Its 
structure is more complex than is shown in figure 6 or than can be discussed in this 
paper. Here it will suffice to say that features can be of two forms: toggle features or 
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valued features. Toggle features are features that are either present or absent in an 
application. Valued features are features that, if they are present in an application, 
must have a value attached to them. For instance, the self-test feature of figure 6 is a 
toggle feature (a sensor is either capable of performing a self-test or it isn’t). The 
memory size feature instead is a valued feature because its instantiation requires the 
application designer to specify a value for it (the actual internal memory size of the 
processor). The type information discriminates between toggle and valued features 
and, in the case of valued features, it defines the type of the value. 

Figure 8 shows an example of family model. Feature ControlSystem has three 
groups: Sensors, Actuators , and Processors. Group Processors has 
cardinality 1 and has only one sub-feature with default cardinality 1 . This means that 
the sub-feature is mandatory (it must be present in all applications instantiated from 
the family). Since the feature cardinality is 1, then only one instance of the feature 
may appear. Group STypes has two sub-features and cardinality 1. This means that 
its sub-features are mutually exclusive. The cardinality of the Sensor feature is a 
range cardinality [ 1..4] which implies that the feature can be present in an application 
with up to four instances. 




Fig. 8. Family model example 



4.2 Application XSD Generator 

The Application XSD Generator is an XSL program that processes a family model to 
generate a meta-model for the applications instantiated from that family (see figure 4). 
Conceptually, an application model can be seen as a feature model where all 
variability has been removed, namely as a feature model where all features are 
mandatory and where all features have cardinality of 1 . In this sense, the feature meta- 
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model generated by the application XSD generator is simpler than the meta-model of 
figure 7 because it does not include the group mechanism and the feature cardinality 
mechanisms. In another sense, however, it is more complex. The family model 
specifies the types of features that can appear in the applications and the local 
composition constraints to which they are subjected. The application XSD generator 
must express these constraints as an XML schema using the XSD language. Basically, 
this is done by mapping each feature group in the family meta-model to an XSD 
group and by constructing an XSD element for each legal combination of features in 
the group. This implies a combinatorial expansion and an exponential increase in the 
size of the application meta-models. It is, however, noteworthy that the computational 
time for applying the XML schema (i.e. the computation time for enforcing the 
application meta-model) only needs increase linearly with the number of features in 
the family model. Our experience with application meta-models derived from family 
models containing about a hundred features is that the computational time for 
enforcing the meta-model remains negligible and compliance with the meta-model 
can consequently be checked in real-time and continuously as the user selects new 
application features within a standard XML tool. 




Fig. 9. Instance of family model of figure 8 



4.3 Constraint Family Model 

Figure 10 shows the constraint family model that we have defined in order to express 
global composition constraints. This model is an instance of the family meta-model of 
figure 7. It covers three types of global constraints. The first two are the traditional 
“requires” and “excludes” constraints [3, 9] which are covered by the Requires and 
Excludes features. The former states that the feature Feature requires the 
RequiredFeatures features to be present. The latter states that the features 
Feature are mutually exclusive. The Custom Condition feature models the 
third type of global constraint. This allows very general XPath expressions to be used 
to express any generic constraint on the combination of features and their values. The 
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Element feature represents either a toggle feature or a valued feature. Each 
Element has a Name (by which it is referenced in Condition) and a Value that 
uniquely identifies the feature (or its value) using XPath syntax. The Condition 
feature is expressed as a logical expression of the above elements or as an arithmetic 
expression where some operands are the values of the above elements. 




Fig. 10. Constraint family model 



Figure 11 shows an example of a complex custom constraint that applies to the 
feature diagram of figure 2. The constraint expresses a logical condition on three 
elements of the feature diagram identified as El, E2 and E3. It states that the control 
system must have a position sensor with self-test capability, or at least two position 
sensors and a processor with a minimum of 4 kilobytes of memory. This example 
shows that the proposed notation is sufficiently powerful to express very general 
constraints on the features and their values. 




Fig. 11. Constraint model example 
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5 Case Studies and Future Work 

We have tested the modelling approach described here on two product families built 
around software frameworks we developed in our group (the AOCS Framework [12] 
and the OBS Framework [13]). Four general conclusions can be drawn from this 
experience. The first one relates to the application specification process which we 
found to be extremely easy. This is primarily because the application meta-model that 
is generated from the family model is very constraining and, at each step of the 
application definition process, it presents the application designer with a narrow list of 
choices. This simplifies the specification job and reduces the chances of errors. 

The second conclusion concerns the tool support. The case studies were done using 
the XmlSpy and oXygen tools. These are ordinary XML editing tools but we found 
that the level of support they offer for feature modelling is most satisfactory. The 
definition of the family meta-model was done in an XSD editor. XmlSpy offers 
graphical facilities to support this task and to automatically enforce compliance with 
the XSD syntax. The definition of the family and application models was done in an 
XML editor configured with the XML schema that implements the respective meta- 
models. The tool continuously enforces compliance with the meta-model. Indeed, in 
the case of oXygen, the users are presented with context-dependent choices of the 
legal features and of their legal values at each step of the feature model editing 
process. As we already mentioned, we initially tried to build our modelling approach 
on top of both GME and EMF but our experience is that the XML tools are both 
more powerful in expressing feature models and easier to use. 

Our third finding relates to the feature macro mechanism (see section 3.2). One of 
our family models is rather large. If we had not had a means to break it up into 
smaller and more manageable sub-models, its construction and maintenance would 
certainly have taken significantly longer. We also found that there were a few parts of 
the feature diagram that were sufficiently similar to be treated as instances or 
extensions of the same feature macro. This introduced a degree of reuse in the family 
model that helped contain its complexity. 

Finally, we have experimented with the mechanism for expressing global 
composition constraints (see section 3.1). We believe that our approach avoids the 
rather cumbersome notation which is typical of other approaches where the global 
constraints are incorporated directly into the feature diagram. We additionally 
appreciated the possibility of defining the global constraints using the same notation 
and environment as is used for the definition of the application and family models. 

In summary, we believe that our experience to date demonstrates the soundness of 
our feature modelling approach. Current work builds upon this result to move closer 
to the generative environment for family instantiation outlined in figure 1 . The feature 
technique described in this paper allows a family and its applications to be described, 
but the realization of the concept of figure 1 also requires the instantiation process of 
the family to be captured. Our current work is focusing on the development of 
domain-specific instantiation languages (DSILs) that are intended to complement 
DSLs to offer a complete model of an application and of its instantiation process. 
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Abstract. GenVoca is a powerful model for component-based product- 
line architectures (PLAs) advocating stepwise refinement as a compo- 
sition principle. This paper introduces a refinement-oriented generative 
language - ReGaL - to implement statically configurable GenVoca PLAs. 
Whereas components are programmed in Java, refinements are program- 
med in ReGaL by the means of generic aspects. Applications are them- 
selves specified in ReGaL as type equations. ReGaL compiles type equa- 
tions by instantiating and weaving refinement aspects with components 
to synthesize the requested (Java) application. As opposed to template- 
based generative implementations, ReGaL promotes a clean separation 
of components and refinements, hence eliminating code tangling and re- 
lated issues. It also defers the choice of component class composition 
structures until configuration time, which provides added flexibility to 
adapt applications. Besides, its architecture model enforces a clear role- 
based design of components and supports useful architectural patterns. 



1 Introduction 

Traditional Software Engineering aims at developing applications that satisfy the 
contextual requirements of particular customers. This results in applications that 
are narrow in scope and difficult to evolve. An alternative paradigm is Domain 
Engineering which sets out to design entire families of applications for well- 
identified domains (e.g., vertical domains like Billing or horizontal domains like 
Graph Algorithms). Technically, families of applications may be implemented 
as component-based product-line architectures. PLAs support a plug-and-play 
approach to the synthesis of applications by embedding code generators that 
automate the assembling of components. Thus, PLA users/clients only need to 
concern themselves with application configuration (i.e., the formal specification 
of applications), application generation (i.e., the compilation of specifications 
into executable applications) coming for free. 

One PLA model reconciling component reuse and architecture scalability is 
GenVoca [BCSOO]. The components of a GenVoca PLA are typed and param- 
eterised with predefined interfaces known as realms. Each component exports 
a single realm (its type) and imports zero or more realms (its parameters). 
Realms determine the composability of components: two components can only 
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be composed if the import realm of one is the export realm of the other. Since 
a component’s export realm may be imported by many different components, 
components are de facto reusable. Since components exporting the same realm 
are plug-compatible, architectures are scalable: adding a new component to a 
realm (i.e. , providing a new implementation) considerably enlarges the applica- 
tion configuration space. 

The semantics of composition in GenVoca are that of large-scale step-wise 
refinement [Dis76][Wir71]. Components are large-scale encapsulation units (e.g., 
collection of classes) programmed to refine (e.g., override, extend) the compo- 
nents bound to their import interfaces at generation time. Application syn- 
thesis may be viewed as a progressive stacking process whereby component 
functionalities get refined whenever a new component is stacked. At the class 
level, component stacking gives rise to increasingly broader and deeper forest of 
classes. As pointed out in [BCSOO], this is essentially the same way collaboration- 
based application frameworks are manually assembled [HHG90][VHN96]. Thus, 
class-based components provide a natural implementation model for role-based 
collaborations, and GenVoca PLAs a solution to the automated synthesis of 
collaboration-based application frameworks. 

GenVoca PLAs have been traditionally implemented using template-based 
idioms such as mixin-layers [SB01][CE00] or template-based extensions of 
Object-Oriented languages [BLS98] [CC01]. In either case, components are pro- 
grammed as template classes, and component composition boils down to tem- 
plate instantiation. This frees developers from the low-level implementation of 
composition mechanisms. One issue though is that component implementations 
mix self-referential code (the collaboration between a component’s classes) and 
refinement code (the refinement of other - yet to be selected - component classes) . 
This tangling results in a lack of clarity, and impairs the separate reuse and evo- 
lution of components and refinements. 

Another issue is that the structures or patterns used to connect component 
classes must be hard-coded. Ideally, this design choice should be postponed until 
configuration time to enable the adaptation of applications to different execu- 
tion environments or to evolving business requirements. Inheritance should be 
the default choice from the performance viewpoint but aggregation may be the 
only option if components have to run distributively. Finally, template-based 
solutions also suffer from technical limitations such as the ’’composition consis- 
tency” problem or the weak typing of compound applications [SB98] [Ost02], 

Another approach consists of programming refinements separately from com- 
ponents. [PSROO] suggest an Aspect-Oriented implementation by considering re- 
finements as component aspects [KLM + 97]. Indeed, refinements cross-cut com- 
ponents (i.e., they glue components together), and they do so following common 
implementation patterns. Programming refinements as aspects solves the issue 
of component composition - component composition is aspect weaving - while 
leveraging the benefits of modularity and separation. 

Still, [PSROO] only use this technique to implement collaboration-based de- 
signs, not PLAs. That is, their aspects are programmed to connect predefined 
pairs of components, not any pair of components. Besides, they impose no re- 
finement semantics: it is left for aspect programmers to define the meaning of 
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component refinement. In this paper, we set out to extend and frame this ap- 
proach to implement statically configurable GenVoca PLAs. To this end, we in- 
troduce a refinement-oriented generative language - ReGaL - to program generic 
refinement aspects for components written in Java. 

First of all, ReGaL provides an architecture description language. The un- 
derlying architectural model is an extension of GenVoca imposing a fine-grain 
definition of components and realms in terms of role-classes and interfaces. This 
is to enforce a sound role-based design for components. The model also allows 
the declaration of inheritance relationships between realms, which leads to a 
more flexible component composition rule. As a result, ReGaL can accomodate 
a wider range of architectural styles and patterns. 

Any architectural description must be accompanied by Java components and 
ReGaL refinements to complete a PLA implementation. ReGaL requires that a 
refinement aspect be programmed for each component role-class. Each aspect 
consists of advices which encode the logic of refinement at the level of methods. 
The set of advices supported by ReGaL are adapted from Aspect J [KHH+01] to 
the context of refinement. Each aspect is also parameterised with its class’ parent 
class and their connection structure - inheritance or aggregation. Connections 
and structures are selected at configuration time and specified as type equations. 
ReGaL compiles these equations by instantiating and weaving generic refinement 
aspects with components to produce the requested Java application. 

This paper presents ReGaL using a PLA of graph algorithms as a running ex- 
ample. Section 2 introduces the graph algorithm PLA and provides background 
on collaboration-based designs and GenVoca. Section 3 presents the concepts, 
syntax, and compilation model of ReGaL. Section 4 discusses the level of appli- 
cation configurability one may achieve with ReGaL. Section 5 concludes. 

2 Collaboration-Based Designs and GenVoca 

A collaboration-based application framework is a stack of components. Each col- 
laboration embodies a feature of the application domain, and is implemented as 
a component encapsulating classes (or class fragments) whose objects collabo- 
rate to realise that feature. Each class plays a specific role in a collaboration - we 
shall use the term role-class thereafter. The stacking of collaborations has the 
semantics of refinement , that is, each collaboration refines the one it is stacked 
on. Refining a collaboration primarily consists of refining its role-classes individ- 
ually. New, non-refining role-classes may also be introduced. 

Generally speaking, a class Y refines a class X if it is structurally linked to X 
and if it complies with the syntactic and semantic obligations imposed by X. As 
far as structure is concerned, this can be achieved by making Y a sub-class of 
X, aggregating X in Y, or connecting Y to X using delegation patterns. As far as 
behaviour is concerned, this involves mapping the constructors of Y to those of 
X, overriding methods and data of X in Y, and/or introducing new methods in Y. 

We illustrate these notions by introducing a framework of graph algorithms 
(adapted from [Hol92] and [SB01]) that we will use throughout the paper. We 
consider three algorithms - vertex numbering, cycle checking, and computation 
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of connected components - in the case of undirected graphs. First of all, we 
implement undirected graphs as a collaboration - UGraph - consisting of a graph 
role-class - ugG - and a vertex role-class - ugV. The three algorithms require 
a procedure to traverse graphs and a workspace to store computational data. 
We implement these services by two distinct components, respectively, DFT and 
BaseAlgo. 

BaseAlgo refines UGraph by introducing a workspace role-class and extending 
the graph and vertex classes with instance variables pointing to a workspace 
object. DFT refines BaseAlgo by extending the graph and vertex classes with 
depth-first traversal services 1 . Its graph role-class implements the main method 
depthFirst, which is a loop calling graph- and vertex-specific routines every time 
a vertex is visited. These routines and their dependencies are shown in Fig. 1. 
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Fig. 1. Calling structure between the 
graph role-class dfG and the vertex role- 
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Fig. 2. A collaboration-based applica- 
tion framework of graph algorithms 



We implement the algorithms as three components - CycleChecking, 
Connected and Numbering. Each refines DFT by overriding some of its traversal 
routines. For instance, Numbering overrides vertexWork to compute an index for 
the invoking vertex object before forwarding the call to the parent object. Fig. 2 
shows a layered design involving the above mentioned components. The design 
is arranged as a matrix with collaborations as rows, roles as columns, and role- 
classes as elements (e.g., ugG implements the GRAPH role in the UGraph collabora- 
tion) . 

A collaboration-based design is turned into a GenVoca PLA by decoupling 
components and making explicit their import/export interfaces (realms). Com- 
ponents are parameterised with their import realms and typed with their export 
realms. For instance, we organise our graph algorithm components into a Gen- 
Voca PLA using two realms - ALGO and BASE: 

ALGO DFT [ALGO] I Numbering [ALGO] I CycleChecking [ALGO] | Connected [ALGO] I BaseAlgo [BASE] 
BASE UGraph 

1 Other types of traversals could be envisaged. 
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In this PLA, UGraph exports BASE but imports no realm (it is a terminal com- 
ponent). All the other components are plug-compatible because they all export 
ALGO (denoted with ’ | . BaseAlgo imports BASE (denoted with square brackets), 
whereas the other ALGO components import ALGO itself. Because they import and 
export the same realm, we say they are symmetric components. 

GenVoca applications are specified by the means of type equations. A type 
equation is a named composition of components that must be syntactically cor- 
rect. That is, only components that are realm- wise compatible (one’s import 
realm matches with the other’s export realm) may be connected. For instance, 
CycleChecking [Connected [Numbering [DFT [BaseAlgo [UGraph] ]]] ] is a correct type 
equation for the graph algorithm PLA (which describes the framework of Fig. 2). 

3 ReGaL 

We present ReGaL through the implementation of the graph algorithm PLA. 
This implementation involves describing the architecture in ReGaL, program- 
ming refinements in ReGaL, and programming components in Java. Java appli- 
cations are generated by writing and compiling type equation programs with the 
ReGaL and Java programs. 

3.1 Component Model 

ReGaL supports an architectural model which extends GenVoca in two ways. 
First of all, the model imposes a fine-grained definition of realms and components 
based on interfaces and role-classes. Realms must be defined as sets of interfaces 
and components as sets of role-classes, where each role-class exports a single 
interface and may import another (possibly the same). This is illustrated in 
Fig. 3 which shows a ReGaL description of the graph algorithm PLA. 



interface GRAPH, 


GRAPH, ALGO->GRAPH, 


// GRAPH, ALGO extends GRAPH 


VERTEX, 


VERTEX_ALGO->VERTEX 


, // VERTEX, ALGO extends VERTEX 


WORKSPACE; 




realm BASE-CGRAPH , 


VERTEX}, ALGO{GRAPH_ 


ALGO , VERTEX, ALGO , WORKSPACE} ; 


BASE UGraph { 






GRAPH 


ugG; 


// graph role of UGraph 


VERTEX 


ugV; >; 


// vertex role of UGraph 


ALGO BaseAlgo [BASE] { 




GRAPH_ALGO 


bsG [GRAPH] ; 


// graph role of BaseAlgo 


VERTEX_ALGO 


bsV [VERTEX] ; 


// vertex role of BaseAlgo 


WORKSPACE 


bsW; }; 


// workspace role of BaseAlgo 


ALGO DFT [ALGO] { 






GRAPH, ALGO 


df G [GRAPH, ALGO] ; 


// graph role of DFT 


VERTEX_ALGO 


df V [VERTEX.ALGO] ; 


// vertex role of DFT 


WORKSPACE 


df W [WORKSPACE] ; } ; 


// workspace role of DFT 



Fig. 3. The (partial) ReGaL description of the graph algorithm PLA 



The realms BASE and ALGO are the sets of interfaces {GRAPH , VERTEX} and 
{GRAPH, ALGO, VERTEX. ALGO, WORKSPACE}, respectively. Components have the same 
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role-classes as those shown in Fig. 2. For instance, BaseAlgo is the set of role- 
classes {bsG.bsV.bsW} importing BASE and exporting ALGO. Precisely, the role-class 
bsG exports GRAPH_ALG0 and imports GRAPH, bsV exports VERTEX_ALGO and imports 
VERTEX, and bsW exports WORKSPACE . 2 The other ALGO components have symmetric 
role-classes, e.g., nbV exports and imports VERTEX. ALGO. 

Secondly, ReGaL allows the declaration of extension (inheritance) relation- 
ships between interfaces. For instance, GRAPH.ALGO is declared to extend GRAPH in 
Fig. 3. Note that multiple inheritance is prohibited. These declarations implicitly 
translate into extension relationships between realms. ReGaL uses the following 
rule: a realm E extends a realm B if every interface of B either belongs to E, or has 
a unique extension in E. For instance, it follows from the declarations of Fig. 3 
that ALGO extends BASE. 

ReGaL extends the component composition rule of GenVoca to accomodate 
realm extensions. Precisely, two components are compatible if one’s export realm 
is equal to, or is an extension of, one of the other’s import realm(s). This policy 
remains consistent with GenVoca, i.e. , GenVoca composition policy applies for 
PLAs that do not feature any realm extension. The advantage lies in an increased 
expressiveness which allows the specification of a wider range of architectural 
patterns. For instance, ReGaL supports architectures featuring optional realms 
[CE00]. Consider the following PLA: 

BASE c {...}; 

OPT o [BASE] {...}; // and OPT extends BASE 

REQ r [BASE] {...}; 

Both r [o [c] ] and r [c] are correct ReGaL type equations since OPT extends 
BASE. So OPT ’s components may be omitted in type equations, that is, OPT is 
optional. If the composition policy of ReGaL was not accommodating realm 
extensions, one would have to design a PLA by collapsing OPT and BASE into 
one realm to achieve the same effect. This would blur the design as realms 
would not map to domain features any longer. Note that ReGaL supports other 
patterns such as adaptor components , e.g., “consumer-producer” components 
turning producer role-classes into consumer role-classes and vice-versa [SB01]. 

3.2 Type Equations 

The type equation program may contain more than one type equation. ReGaL 
extends GenVoca again by letting users specify the type of connection used be- 
tween role-classes, namely, inheritance or aggregation. Syntactically, each com- 
ponent occurrence in a type equation is followed by a tuple of values whose n-th 
element refers to the connection between the n-th role-class of the component 3 
and its parent. By convention, i refers to inheritance and a to aggregation. The 
equation all below corresponds to the framework of Fig. 2: 

CycleChecking<i , i , i> [Connected<i , i , i> [Numbering<i , i , i> [DFT<a, a, a> [BaseAlgo<a, a> [UGraph] ] ] ] ] 
all; 



2 bsW does not import any interface: it is terminal. 

3 According to the ordering of role-class declarations in the refinement program. 
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3.3 Interfaces 

Realm interfaces are defined in the refinement program. They expose the meth- 
ods that “exporting” (Java) role-classes must implement and that “importing” 
role-classes may access in their (future) parents. Logically, private methods 
should not be made accessible to other role-classes. For this reason, realm in- 
terface methods may only be declared public or protected (see e.g. the realm 
interface VERTEX_ALGO in Fig. 4(a)). To publish the methods of a role-class that 
are private or specific (that is, not implemented by other role-classes), ReGaL 
proposes role-specific interfaces. 

ReGaL actually requires that any private or specific method refining the 
parent role-class be exposed in a role-specific interface. This guarantees that any 
refining method either appears in a realm interface or a role-specific interface. 
The rationale is that refining methods are the methods to advise , hence the need 
to make them visible to aspect programmers and easily identifiable by generators. 
Fig. 4(b) shows the specific interface nbVI of the role-class nbV of Numbering. 

3.4 Role-Classes 

Role-classes and components are implemented in Java - the former as classes, 
the latter as packages of role-classes. The rules governing the implementation 
are the following. Firstly, each role-class must implement the methods of its 
realm and role-specific interfaces. The interfaces are assumed to be in scope, as 
if they were true Java interfaces. Secondly, the bodies of refining methods must 
be stripped of any reference to the (future) parent class. As explained below, 
these refinement instructions are programmed separately using aspect advices 
and woven at composition time. Constructors must also be implemented without 
assuming the existence of parent role-classes, except when a role-class extends 
another role-class in its own component. 

3.5 Refinement Aspects 

Generic refinement aspects are part of the refinement program and implement 
the refinement logic of role-classes at the level of methods. ReGaL imposes that 
one aspect be programmed per non-terminal role-class. A role-class’ refinement 
aspect consists of refinement advices that advise the methods of its export realm 
and role-specific interfaces. Each advice applies to a single method and specifies 
instructions to execute when the body of the method executes. 

ReGaL supports before , after , and around refinement advices. Their seman- 
tics are that of Aspect J’s before , after , and around advices on method execution 
pointcuts (only). Syntactically, execution pointcuts need not be specified, only 
the type of the advice and the full method signature are required. Further, a 
specific construct - getParentO - is provided to access the object of the future 
parent role-class from within advice bodies. Fig. 4(d) shows the refinement as- 
pect programmed for the symmetric role-class nbV exporting VERTEX_ALGO and 
nbVI. 
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interface VERTEX_ALGO { // realm interface - 
protected void edgeWork (VERTEX) ; 
protected void vertexWork () ; 
public int getNeighbours () ; 
protected boolean stopO ; ... 

> 


interface nbV::nbVI { // specific interface 
public int getVertexNumber () ; 

> 


(a) (b) 


class nbV { // role-class 

public int getVertexNumber () { return _id;} 
protected void edgeWork (VERTEX. ALGO) -Q 
protected void vertexWork () { 
nbW w = (nbW) getWorkspaceO ; 

_id = w.getValueO ; 
w. incValueO ; } 

public int getNeighbours () { return 0; } 
protected boolean stopO { return false; } 

> 


refinement nbV { // refinement aspect 

before (int n, getVertexNumber) -Q 
before (edgeWork, VERTEX.ALGO v) 

{ getParent () . edgeWork (v) ; } 
before (vertexWork) 

{ getParent () . vertexWorkO ; } 
after (int n, getNeighbours) 

{ n = getParent () .getNeighbours () ; } 
after (bool s, stop) 

{ s |= getParent () . stopO ; } ... 

> 



(c) 



(d) 



Fig. 4. A symmetric role-class, its realm and specific interfaces and refinement aspect 



The first advice on the role-specific method getVertexNumber has an empty 
body so the method will execute “normally”. The same behaviour could be 
achieved by not advising the method. The advice on edgeWork forwards the call 
to the parent object before edgeWork executes on the invoking object. Indeed, 
edgeWork is a routine called by depthFirst (see Fig. 1) so each variant in a refine- 
ment chain 4 must be executed. The same applies for vertexWork. The difference 
here is that nbV provides a non-empty implementation (which computes an in- 
dex for the invoking vertex). The ordering of execution between the variants of 
vertexWork being irrelevant, programming an after advice would be equally safe. 

The advice on getNeighbours lets this integer function execute normally but 
overrides the return value with that of the parent function. By advising this 
function the same way in each VERTEX_ALGO role-class, the returned value will 
always come from the inner-most role-class (ugV), as required. The last advice 
shows the case where a method’s return value is a function of the values re- 
turned by its variants. Here, the after advice on stop computes an inclusive-or 
on the return values. Indeed, stop is meant to halt a depth-first search when any 
algorithm-specific stopping criterion is met (e.g., the cycle checking component 
halts the search as soon as an edge is traversed for the second time). 



3.6 Generating Applications 

Application generation is a four-step process involving packaging, instantiation, 
Aspect J weaving and Java compilation. The generator starts by creating one 

4 Each VERTEX_ALGO role-class stacked on top of nbV contributes a variant. 
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package per equation defined in the type equation program. Each equation- 
package encapsulates as many component-packages as there are component oc- 
currences in the corresponding equation. The component-packages are clones of 
the original packages and named after their components (renaming conventions 
apply if a component appears multiple times in a type equation). The generator 
also converts realm and role-specific interfaces into Java interfaces (removing 
private and protected methods but preserving extension relationships). Finally, 
it places the resulting interfaces in a package named ReGaL Interfaces. 

Each role-class admits as many instantiations (variants) as it has occurrences 
in the type equation program. The generator transforms the role-class instantia- 
tions inside the component-packages as follows. First of all, it weaves implemen- 
tation links between the role-class instantiations and the appropriate interfaces 
of ReGaLInterfaces. Then it generates an AspectJ aspect for each role-class in- 
stantiation by transforming the original refinement aspect of the role-class. This 
is performed differently based on the structure used to connect the role-class 
instantiation to its parent. 

In the case of aggregation, the generator declares a reference to the parent 
role-class in the aspect and an AspectJ execution pointcut for each advised 
method. Then it transforms each refinement advice into an AspectJ advice on 
the corresponding method execution pointcut. Finally, it transforms calls to 
parents in advice bodies by substituting the parent object to getParentO and 
upcasting method arguments to the appropriate parent role-classes. The case of 
inheritance is dealt with differently due to AspectJ restrictions. Basically, the 
generator uses lightweight aspect classes to implement inheritance links between 
role-classes, and it weaves refinement advices directly in the source code of role- 
classes. Table 1 shows the aspect generated from nbV’s refinement aspect (see 
Fig. 4) when nbV aggregates or inherits from dfV. 



Table 1 . Generated aspects for aggregation-based and inheritance-based compositions 



aggregation-based composition 


inheritance-based composition 


dfV nbV . _df V ; 


declare parents: nbV extends dfV; 


pointcut vertexWorkExec(nbV v) : 
execution(vertexWork(. . . ; 

before (nbV v) : vertexWorkExec(v) 
{ v._dfV. vertexWorkO ;} 


void nbV. vertexWorkO { super .vertexWorkO ;} 


pointcut edgeWorkExec (nbV v) : 
execution(edgeWork(VERTEX_ALGO) ) ; 
after (nbV v, VERTEX.ALGO vl) : 
edgeWorkExec (v) { v._dfV.edgeWork((dfV) vl)} 


void nbV.edgeWork (VERTEX. ALGO vl) 
{ super. edgeWork( (dfV) vl) ; } 



The above transformation procedure is correct when no inheritance links exist 
between the role-classes of a component. Otherwise, composition consistency 
must be enforced [Ost02]. Assume that each component of our PLA contains 
a role-class for weighted graphs: ugWG in UGraph, bsWG in BaseAlgo, and so on. 
Further assume that ugWG inherits from ugG in UGraph. At composition time, this 
inheritance link must be redirected so that ugWG inherits from the most refined 
variant of ugG in the synthesized application. For instance, the type equation 



124 D. Lesaint and G. Papamargaritis 



DFT [BaseAlgo [UGraph] ] must give rise to two refinement chains - (dfWG.bsWG.ugWG) 
and (df G,bsG.ugG) - with ugWG extending dfG. This is the pattern applied by the 
generator (known as the Sibling pattern [CBML02]). 

Whereas refining methods are handled by generating aspects, constructors 
are handled by direct source code manipulation. The generator is responsible for 
chaining constructors along refinement chains as no predefined pairing exists be- 
tween the constructors of a role-class and those of its parent. The method consists 
of creating p copies of each child constructor if there are p parent constructors. 
Each copy is paired to a parent constructor C by extending its signature with 
C’s parameters, and weaving a call to C. This constructor propagation method 
proceeds top-down along refinement chains as described in [CC01]. 

Once the instantiation stage is complete, the generator invokes the AspectJ 
weaver (version 1.1) to weave the generated AspectJ and component programs. 
The resulting program is then compiled with Java. 

3.7 Using Applications 

Client programs must import the interface and type equation packages to use a 
generated application. They may refer to a role-class instantiation by qualifying 
the name of the role-class with its component and type equation packages, if 
necessary. The program below uses the equation all. 

import all.*; import ReGaLInterf aces . * ; 

WORKSPACE w = new cyW() ; 

GRAPH. ALGO g = new cyG(w) ; 

VERTEX.ALGO vl = new cyV(w) , v2 = new cyV(w) , ... 
g. addVertex(vl) ; g.addVertex(v2) ; g. addEdge(vl ,v2) ; ... 
g.depthFirst () ; System. out . println ("cyclic : " + g.isCyclicO) ; 



4 Configuring Applications 

Configuring applications is a combinatorial task involving advanced knowledge- 
based reasoning [BG97][BCRW00]. On the one hand, it consists of selecting, 
ordering, and composing components in the face of multiple options. On the 
other hand, each decision must satisfy a variety of requirements such as: 

1. interface compatibility constraints, 

2. design rules, e.g., any algorithmic component must inherit from the DFT com- 
ponent if the latter is higher-level in the refinement chain , 

3. user-defined preferences, e.g., cycle checking should be run independently 
from other algorithms. 

Assume we want an application providing simultaneous numbering, cycle 
checking and connectivity computation whenever depthFirst is invoked. Further 
assume that identical refinement aspects are programmed for DFT and the al- 
gorithmic components which systematically forward calls to parent objects, as 
shown in Fig. 4. Achieving simultaneity involves grouping the three components 
with DFT, ordering them, and extending BaseAlgo [UGraph] with the resulting sub- 
equation. 
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BaseAlgo and UGraph may be composed in 2 2 different ways, and BaseAlgo [ 
UGraph] may be extended with the sub-equation in 2 3 different ways. The or- 
dering of components within the sub-equation is free as long as the choice of 
composition structures meets the design rule mentioned in point (2) above. For 
instance, inheritance must be used throughout the sub-equation if DFT is its 
inner-most component. Overall, there are 2 2 • 2 3 • X^=o 3! ■ (2 3 ) J = 112320 valid 
type equations that achieve the desired service. 

Non-functional criteria are often used to discriminate semantically equivalent 
type equations and composition structures can help in this respect. Suppose that 
performance as measured by the CPU time taken by depthFirst is the main 
criterion. Then inheritance should be preferred to aggregation to avoid the run- 
time overhead of indirections. It should be bore in mind though that the choice 
of composition structures in relationship with the pre-programmed refinement 
advices do affect application behaviour. This flexibility, when controlled, can be 
exploited with interesting effects. 

Assume we want to run cycle checking independently from the other algo- 
rithms. One approach is to extend the PLA with a symmetric ALGO component 
Separator. This is a dummy component that does not advise, nor override, 
the depth-first traversal routines. By inserting it between the components to 
decouple in a type equation and by aggregating it to its parent, the goal is 
attained: its traversal routines will not forward any call upwards, and they 
will not be reached either by virtual dispatch. For instance, the equation 
DFT<a,a,a> [CycleChecking 

<a,a, a> [Separator<a, a, a> [DFT<a, a, a> [Numbering<a, a, a> [BaseAlgo<a, a> [UGraph] 
]]]]] will provide independent cycle checking and numbering services. 

5 Conclusion 

We have proposed the use of generic aspects to program component refinements 
when implementing statically configurable GenVoca PLAs. This approach avoids 
some of the problems inherent to template-based approaches such as the code 
tangling between refinements and components and the hard-coding of composi- 
tion structures. Besides, we have shown that supporting programmable refine- 
ment aspects and configurable composition structures facilitates the development 
of highly symmetric PLAs, hence allowing a given service to be synthesised in 
many different ways. This makes the configuration task even more critical. One 
research direction is to turn ReGaL into a true configuration language (e.g., pro- 
viding syntax to capture configuration properties and design rules) and equip its 
generator with automated or interactive configuration capabilities (e.g., search 
and inference mechanisms). This is the direction followed in [LP04]. 
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Abstract. Aspect oriented programming (AOP) seeks to decompose concerns 
which crosscut system structure into more manageable modules. However, 
current AOP techniques alone lack the configuration mechanisms and 
generalisation capabilities that are required to realise variability (through clear 
reuse specifications). Conversely, frame technology provides extensive 
mechanisms for providing reuse and configuration yet cannot effectively 
modularise crosscutting concerns. This paper proposes ' framed aspects’ a 
technique and methodology which combines the respective strengths of AOP, 
frame technology and Feature-Oriented Domain Analysis (FODA). We argue 
that framed aspects can alleviate many of the problems the technologies have 
when used in isolation and also provide a framework for implementing fine- 
grained variability. The approach is demonstrated with the design and 
implementation of a generic caching component. 



1 Introduction 

The use of AOP in the production of software is now gaining major backing in the 
software industry [1]. AOP allows for modularisation of concerns that normally cause 
crosscutting in object oriented (00) systems. However, there are no mechanisms 
available in current AOP languages to support and realise fine-grained configurability 
and variability, thus, the potential for aspects to be reused in different contexts is 
limited. This paper demonstrates framed aspects, an AOP independent meta language 
which adds the power of parameterisation, construction time constraint checking and 
conditional compilation to AOP languages. Parameterisation support for AOP allows 
an aspect to be customised for a particular scenario, and therefore increases the 
reusability of an aspect module. Conditional compilation allows for optional and 
alternative variant features of an aspect module to be included or excluded, thus 
resulting in optimal usage of code. Finally, constraints define rules for limiting the 
acceptable values and combinations of features in which an aspect module can be 
created. 

AOP approaches such as AspectJ [2] do not allow the specification for a concern to be 
written as a separate entity from the aspect itself, thus the developer must have an 
intricate understanding of the aspect code and thus cannot treat the aspect in a black 
box manner. The work presented in this paper demonstrates how framed aspects 
address this problem and improve reusability and evolvability for AOP languages. 
We focus on one particular AOP technique (Aspect J) but the concepts and meta 
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language are generic and therefore applicable to other AOP approaches. The next 
section discusses the concept of frame technology and how it can be beneficial to 
AOP. Section three introduces the framed aspect approach. Section four demonstrates 
the framed aspect approach, used in conjunction with AspectJ, to create a reusable 
generic caching component. Section five discusses related work and finally section 
six concludes the paper. 



2 Frames and AOP 

2.1 Frames 

Frame technology [3] was conceived in the late 1970s, and the technology has since 
evolved into XML forms [4], Frame technology is language independent and allows 
useful code to be generalised, adapted and thus configured to different requirements 
and specifications by using code templates organised into a hierarchy of modules 
known as ‘frames’. A developer can then write a specification and by using a frame 
processor create customised systems, components and libraries. An individual frame 
(or group of frames) is the separation of a concern, class, method or related attributes. 
Variability is achieved by allowing variation points, code repetitions, options, slots 
for new functionality etc., to be marked invasively, using meta tags embedded within 
the program code. Typical commands in frames are <set> (sets a variable), <break> 
(create a slot for new functionality to be added at a later date) <select> (selects an 
option), <adapt> (refines a frame with new code) and <while> (creates a loop around 
code which repeats). 



2.2 AOP 

AOP allows concerns which would traditionally crosscut system structure in OO 
systems to be encapsulated into single entities known as aspects. AOP languages, 
such as AspectJ, allow existing modules to be refined statically, in a non-invasive 
manner, using introductions (add new methods, fields, interfaces and superclasses) or 
through injection of additional behaviour in the control flow at runtime via advice. 
Additionally, joinpoints (points of interest to which we add new behaviour) can be 
defined using pointcuts. Fig. 1 demonstrates a Document caching aspect which uses 
a defined pointcut and advice. 

However, varying the aspect for different contexts is difficult due to the lack of 
parameterisation and configuration support. AspectJ does not allow the separation 
between specification and the aspect code. The aspect is, therefore, a white box 
component and an understanding of what the aspect does at code level is required. 



2.3 Comparing Frames and AOP 

Frames and AOP share commonalities, such as the ability to refine modules or add 
code to defined points of interest. However, the mechanisms by which they achieve 
this is different in both technologies. The explicit invasive approach employed by 
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aspect SimpleCacheAspect 

{ 

private int MAX_CACHE_SIZE = 100; 

private int PERC_TO_DEL = 50; 

private Hashtable cache = new Hashtable ( ) ; 

pointcut pci (Editor g, String url) : args (g,url) && 

call (public void Network. requestlnfo (Editor, String)); 

void around(Editor g. String url): pcl(g,url) 

{ 

PageContent cachedPage= (PageContent) cache. get (url) ; 
if (cachedPage==null) 

{ 

proceed (g, url) ; 

PageContent page=new PageContent (g.getDocument ()) ; 
addToCache (url, page) ; 

} 

else 

g . setDocument ( cachedPage . getContent ( ) ) ; 

} 

class PageContent { 

private Document data; 
private int accesses=0; 

public PageContent (Document d) { data = d; } 

public Document getDataO { accesses++; return data; } 

public int getAccesses ( ) { return accesses; } 

} 



Creation of pci pointcut on the 
method requestlnfo in Network 

class. 



Around advice which executes 
whenever the method defined in 
pci is called. If record doesn’t 
exist in cache, proceed with 
original call then store the result in 
the cache. Otherwise populate 
editor with cached content. 



Data structure for storing 

Document content of editor pane. 
The access scheme is 

incremented every time the 
document is accessed from the 
cache. 



Fig. 1 . A simple editor pane caching aspect in AspectJ 



frames, while being very flexible as customisations can be added anywhere in the 
code (compared to the restricted join point model in AOP), can lead to poorly 
modularised, heavily tagged and hard to maintain code. The strengths and weaknesses 
of both technologies are summarised in table 1. The strengths of one technique are the 
weaknesses of the other and vice versa. A hybrid of the two approaches can provide 
the combined benefits thus increasing configurability, modularity, reusability, 
evolvability and longevity of the aspects. 



Table 1. Comparing frames and AOP 



Capability 


Framing with 00 


AOP 


Configuration Mechanism 


Very comprehensive 
configuration possible 


Not supported natively, dependent 
on IDE 


Separation of Concern 


Only non crosscutting concerns 
supported 


Addresses problems of crosscutting 
concerns 


Templates 


Allows code to be generalised 
to aid reuse in different contexts 


Not supported 


Code Generation 


Construction time mechanism 
allows generation of code and 
refactoring via parameterisation. 


Generates code which (in the case 
of advice) is bound at run time 


Language Independence 


Supports any textual document 
and therefore any language 


Constrained to implementation 
language, although language 
independent AOP forms exist 


Use on Legacy Systems 


Limited 


Supports evolution of legacy 
systems at source and byte code 
level 


Variation Point Identification 


Invasive breakpoints 


Non invasive joinpoints 


Dynamic Runtime Evolution 


Not supported 


Possible in JAC and JMangler. 
Future versions of AspectJ will have 
support. 
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3 Framed Aspects 

3.1 Rationale for Framed Aspects 

The framed aspect approach uses AOP to modularise crosscutting and tangled 
concerns and utilises framing to allow those aspects to be parameterised and 
configured to different requirements. Many commands in frames (such as <break>, 
which has before and after forms similar to AOP advice) are in our opinion better 
implemented in AOP languages and in a much cleaner non-invasive way, thus we 
developed the Lancaster Frame Processor (LFP) meta language. LFP is essentially a 
cut down version of the XV CL [4] frame processor (albeit with some added 
commands for constraints and a simpler syntax) and utilises only a subset of the 
commands used in traditional framing tools forcing the programmer to use AOP for 
the remainder. This balance of AOP and frames reduces the meta code induced by 
frames (due to their invasive nature) and at the same time provides effective 
parameterisation and reconfiguration support for aspects. We are of the opinion that 
aspects can also have concerns within themselves, especially as aspect modules 
become larger as new variants and features are added. Breaking up the aspect into 
smaller modules helps to localise these inner concerns in a manner in which 
inheritance cannot and also provide a framework for development. Moreover, 
allowing the aspect to be broken into sub-components allows pointcuts, advice, 
introductions and members to be modelled independently from one another as 
opposed to being tightly coupled 1 . 



3.2 Framed Aspect Composition 

A framed aspect composition is made up of three distinct modules: 

• Framed Aspect Code. This module consists of the normal aspect code and 
parameterised aspect code. 

• Composition Rules: This module maps out possible legal aspect feature 
compositions, combinations, constraints and controls how these are bound 
together. 

• Specification: Contains the developer’s customisation specifications. The 
developer will usually take an incomplete template specification and fill in the 
options and variables s/he wishes to set. 

Fig. 2 demonstrates how the specification, compositions rules and framed aspects 

(parameterised aspect code) are bound together in order to generate the source code. 



3.3 Delineating Frames 

Creating the framed aspects requires careful consideration of the variants and scope 
for which the aspect is intended. The first step after discovering these variants is to 
create a feature diagram using FODA [5], which describes the dependencies, options 



1 Similar discussions have taken place on the AspectJ users list. cf. G. Kiczales and C. Beust 
10 th July 2003 
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Fig. 2. Framed aspect composition 



and alternative characteristics of the feature aspect. The feature approach provides a 
natural design method for use with framed aspects and aspect frames can be deduced 
by simply delineating the boundaries between the different options and alternatives in 
the model. Figs. 3 (a) - (e) demonstrate how the boundaries are delineated for 
features X, Y and (in the case of alternative features) Z. This gives the programmer a 
starting point for developing the frames. However, as development progresses, there 
might arise a need for new frames to capture code that is duplicated across multiple 
modules. 




Fig. 3. Delineating frame boundaries of a) mandatory, b) optional and c) alternative features, 
and frame refactoring showing d) original and e) transformation. 

Fig. 3(d) and fig. 3(e) demonstrates how alternative variants (Y and Z) could contain 
duplicate code (for instance an algorithm), and thus would benefit from an extra layer 
(frame J) which contains the common code or by simply moving the duplicated code 
to the parent frame. Moreover, frames can break down large aspect modules into 
smaller and, therefore, more manageable modules and hide away less important 
information from the main concern. A frame can enhance reusability by allowing a 
component (for example, an algorithm) to be framed separately from the main 
codebase and thus reused in other contexts. Frame commands are utilised for finer 
grained variability, parameterisation and constraints, while AOP is used for 
integrating the concern in a non- invasive manner. AOP is also used where a coarser 
grained functionality is required or when a particular concern crosscuts multiple 
modules. The process is described in much greater detail in section 4 where we 
implement a generic caching component. 
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3.4 Framing Aspects 

3.4.1 Utilising Parameterisation 

Parameterisation in frames can be used with any textual representation such as a type 
or object, a method, joinpoint or pointcut designator. We consider this form of 
parameterisation to be very powerful and while languages such as Java 1 .5 promise to 
bring generics as standard, the frame approach can be much more flexible as any 
programming construct can be parameterised. Parameterisation can also be applied to 
aspects to allow scope of aspect behaviour to be customised, to change the method or 
methods a pointcut is bound to, or introduce new members into a specified class. 



private int MAX_CACHE_SIZE = <@MAX_CACHE_SIZE> ; 
private int PERC_TO_DEL = <@PERC_TO_DEL> ; 
pointcut pci (<@EDITOR_NAME> g, String url) :args <g,url) && 
call (public void <@NETWORK_CLASS> . <@REQUEST_MTHD> 
(<@EDITOR_NAME>, String)); 

void around ( (<@EDITOR_NAME> g. String url): pcl(g,url) 

{ // impl } 
class PageContent 
{ 

private <@DOC_TYPE> data; 
private int accesses=0; 

public PageContent ( <@DOC_TYPE> d) { data =d; } 

public <@DOC_TYPE> getData() { accesses++; return data } 

public int getAccesses ( ) { return accesses; } 

} 



Fig. 4. Parameterised version of simple caching aspect 



Returning to our AspectJ implementation of a web cache (fig. 1) in section 2.3 we 
could apply parameterisation in numerous ways to enhance its reuse as shown in fig. 
4, so that it can be used to store data other than Document, or be used with classes 
other than Editor or Network for example. 

3.4.2 Utilising Options, Adapts, and Constraints 

Options are used for conditional compilation of optional and alternative features. If an 
option is indicated in the specification frame then the code delineated by the option 
tags in the composition rules or the aspect code is included. Adapts provide the 
framework necessary for controlling the development of framed aspects and binding 
of the frames. Options and adapts are typically used in the composition rules module 
rather than in the aspect code itself. Although this is not a hard and fast rule, we 
believe that adding them directly to aspect code is a sign that parts of the aspect need 
to be refactored to another frame. A new feature added by LFP is the ability to add 
constraints to the aspects. The form of these constraints might be lower and upper 
boundary limits or the requirement that the operation/value required is contained in a 
predefined set. 

3.4.3 Refactoring Framed Aspects 

Refactoring of aspect code is in order for the required joinpoints to be exposed. An 
example might be to refactor advice as a single or multiple method members, 
depending upon the scenario. Refactoring code in this way can improve aspect 
understandability. 
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4 Implementing a Generic Cache Using Framed Aspects 

In this section we describe our experiences of implementing a server side generic 
caching component using framed aspects. 

4.1 Cache Description 

The cache is designed to be used in either database or web environments and can be 
configured easily to different requirements and situations. 

4.1.1 Database Cache 

The database cache caches results of SQL queries sent from clients. The cache can be 
configured for situations where only read operations (SELECT queries) are in 
operation but optionally can also be used where write operations (UPDATE, INSERT, 
etc.) are used. When updates are received, there is the need for information in the 
cache to be refreshed, thus, there are two separate update strategies, namely: 

• Every Write: Every write operation to the database triggers the cache update 
mechanism. 

• Time Based: The cache is refreshed at time intervals. 

4.1.2 Web Cache 

The web cache is responsible for storing web pages and has options for allowing the 
cache to be refreshed. The mechanisms that do this will be different from that of the 
database cache and so are implemented separately, they are: 

• Automatic: Whenever a url from a client is received, the cache automatically 
sends a small query to check if the page the url points to is newer than the one in 
the cache. If so, the web page currently in the cache is replaced by the newer 
one. 

• Manual: As above, but instead the client explicitly has to ask for the page in the 
cache to be refreshed. 

4.1.3 Deletion Scheme 

Eventually the cache will become full and there has to be some mechanism in place to 
remove records currently held in the cache in order to free up space. The mechanism 
will be set to a particular strategy: 

• Access: Delete least accessed records 

• Date: Delete oldest records 

• Size: Delete largest records. 
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Fig. 5. Feature model for generic cache 




Fig. 6. Delineating frames in the generic cache 



4.2 The Feature Model 

The generic cache is shown in the feature model depicted in fig. 5, which describes 
the possible compositions. From the feature diagram we can then delineate the 
framed aspect modules, shown in fig. 6, using the rules as discussed in section 3.3. 



4.3 Framed Aspect Code Examples 

In this section we illustrate examples of the framed aspect code contained in our 
generic cache. The examples contained herein focus mainly on the database caching 
variant, although the principles are entirely applicable to other features. 

4.3.1 Cache Frame 

The cache frame (fig. 7) contains code that is common to all variant forms of the 
generic cache. To enhance reusability and flexibility we used the framed aspect 
approach to parameterise the cache frame and lower order frames with the following 
variants: 
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• MAX_CACHE_SIZE: Sets the maximum size of records the cache will hold. 

• PERC_TO_DEL: The amount of records to delete when the deletion mechanism 
is invoked. 

• CONN_CLASS: The class which contains the methods for sending the query to 
the database and also sending the results back to the client. 

• SEND_QUERY : The method which sends the query to the database. 

• REPL Y_CLIENT: The method which sends the result back to the client. 

• DOC_TYPE: The type of information that is being stored in the cache (e.g. 
String, Document, CachedResultSet etc.). 



< frame name = " CACHE "> 








private int MAX_CACHE_SIZE = <@MAX_CACHE_SIZE> ; 
private int PERC ENTAGE_TO_DEL = <@PERC_TO_DEL> ; 
private Hashtable cache = new Hashtable (MAX_CACHE_SIZE) ; 






public void addToCache (String key, <@DOC_TYPE> data) 
{ / / . . impl. . } 






pointcut QUERY_PCT( String key, <@CONN_CLASS> c) : this(c) && args 
(key) && call (public void<@CONN_CLASS> . <@SEND_QUERY> ( String) ) ; 


© 


pointcut RESULT_PCT ( <@DOC_TYPE> data, <@CONN_CLASS> c) : 

! within (Cache) && this(c) && args (data) && 

call (public void @CONN_CLASS ” /> . <@REPLY_CLIENT> ( <@DOC_TYPE> ) ) ; 


© 


class CacheDS 








private <@DOC_TYPE> data; 
publ ic CacheDS ( < @DOC_TYPE> d ) 
{ 

data = d; 

} 

public <@DOC_TYPE> getData ( ) 








return data; 
} 

} 









Fig. 7. Cache frame 



Code Description 

1. Sets the size of the cache and percentage to be deleted as set by the parameters in 
the specification. 

2. Creates a pointcut for intercepting the call to the method which executes SQL 
queries on the database. 

3. Creates a pointcut for intercepting the results sent back to the client. 

4. CacheDS is a data structure for storing the cache results. 

4.3.2 Writable Frame 

The writable frame (fig. 8) is an optional variant, which is used when the cache is to 
be used in situations where insertions and updates are made to the database. The 
cache will thus require to be updated in some manner, although this functionality will 
be implemented fully in the lower frames, this frame contains functionality common 
to all variants below it in the hierarchy. 

Code Description 

1. Pointcut used to trap new instances of CacheDS (data structure for holding the 
result data to be cached). 

2. Pointcut to capture ResultSet from currently executing query. 
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3. Advice which adds tables contained within the executing query by a particular 
client to the CacheDS data structure 

4. Advice which captures the ResultSet to obtain the ResultSetMetaData and, 
therefore, the tables used in the resulting query. 

5. Introductions into the CacheDS data structure which adds new fields and 
methods. 

6. Introductions into the current CONN_CLASS to store tables for the current 
executing query. 

This frame demonstrates the strength of the framed aspect approach over frame 
technology and AOP alone, by showing how parameterisation and crosscutting 
refinements can be encapsulated within a single frame. 



< frame name = "WRITABLE "> 

pointcut DS_INSTANCE_PCT(<@CONN_CLASS> c) : cflow (this(c)) && 
call (CacheDS .new ( . . ) ) ; 

pointcut RESULTSET_PCT ( <@CONN_CLASS> c) : this(c) && 
call (ResultSet Statement . executeQuery (String) ) ; 

after (<@CONN_CLASS> c) returning ( CacheDS cds) : DS_INSTANCE_PCT ( c ) 
{ 

cds . setTables ( c . getTables ( ) ) ; 

} 

after (<@CONN_CLASS> c) returning (ResultSet rs) : RESULTSET_PCT (c) 
{ 

try 

{ 

ResultSetMetaData rsmd = rs . getMetaData ( ) ; 
c . setTables (getTablesFromMetaData (rsmd) ) ; 

} 

catch (SQLException sqle) {} 

} 

private boolean CacheDS . isValid = true; 
private Vector CacheDS . tables ; 
public void CacheDS . setTables (Vector v) 

{ 

tables = v; 

} 

public void CacheDS . containsTable (String s) 

{ 

if (tables . contains (s ) ) isValid = false; 

} 

public boolean CacheDS . isValid ( ) 

{ 

return isValid; 

} 

public Vector CacheDS . getTables ( ) 

{ 

return tables ; 

} 

private Vector <@CONN_CLASS> . tables = new V 
public void <@CONN_CLASS> . setTables (Vector 
{ 

tables=v; 

} 

public Vector <@CONN_CLASS> . getTables ( ) 

{ 

return tables ; 

} 

// methods for getting table names from an SQL query and metadata 



© 



ctor ( ) ; 
) 



© 



© 

© 

© 



© 



Fig. 8. Writable frame 
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4.3.3 Specification Frame 

The developer can write a specification, separate from the aspects, which will adapt 
the framed aspects with the required functionality. Fig. 9 demonstrates a typical 
specification for a database cache. 



< frame name = "CACHE_SPEC"> 


© 


<select option = " CACHE_TYPE " value = " DAT ABAS E_CACHE " /> 
<set var = n MAX_CACHE_SIZE" value = "1000" /> 

<select option = " DELETION_SCHEME " value = "ACCESS" /> 
<set var = "PERC TO DEL" value = "50" /> 


© 


<set var = "CONN_CLASS" value = "DBConnection" /> 
<set var = " SEND_QUERY " value = "sendQuery" /> 

<set var = "REPLY_CLIENT" value = "replyToClient" /> 
<set var = "DOC_TYPE" value = "String" /> 


© 


<select option = "WRITABLE" value = "TRUE" /> 

<select option = " DB_UPDATE_SCHEME " value = "EVERYWRITE" /> 




<adapt frame = "CACHE_RULES" /> 





Fig. 9. Typical specification frame for a database cache 



Description 

1. The database cache option is selected for CACHE_TYPE, 1000 query resultsets 
can be stored by setting MAX_CACHE_SIZE, DELETION_SCHEME is set to 
the least accessed option, and PERC_TO_DEL is set to 50%. 

2. CONN_CLASS targets a class called DBConnection, the methods for sending 
queries (sendQuery) to the database and sending the query results back to the 
client (replyToClient) are bound to SEND_QUERY and REPLY_CLIENT 
respectively, while the type of data to be stored in the cache, DOC_TYPE, is 
bound to String. 

3. The WRITABLE option is selected and the EVERY WRITE update scheme is 
chosen. 

4. Finally the specification is processed by the composition rules defined for the 
cache component to bind the components together. 

4.3.4 Composition Rule Frame 

The composition rules shown in fig. 10 bind the framed aspects together and also 

define constraints as to what can be bound. 

Description 

The rules shown here consist of: 

1 . Constraining meta variables to sets or ranges of possible values. 

2. Adapting mandatory features as defined by the specification. 

3. Adapting optional features if selected. 

4. Adaptation rules for the database cache. 

The separation of the composition rules from the main aspect code allows different 

rules to be created and enhances the possibility for reuse of the framed aspects in 

different contexts. 
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<frame name = CACHE_RULES> 



<cons train var 
<cons train var 
<cons train var 
cconstrain var 
<constrain var 



" CACHE_TYPE " toSet = " DAT ABAS E_CACHE , WEB_CACHE " / > 

" PERC_TO_DEL " t oBoundary = "25,100 n /> 

" DELETION_SCHEME " toSet = "ACCESS , DATE, SIZE" /> 

" DB_UPDATE_SCHEME " toSet = "EVERYWRITE, TIME_BASED" /> 
" WEB_UPDATE_SCHEME " toSet = "AUTOMATIC, MANUAL" /> 



<adapt frame = " CACHE_TYPE " / > 
<adapt frame = "DELETION_SCHEME" /> 



© 



coption name = "MANUAL_CACHE_CLEAR" value = "TRUE"> 
< adapt frame = "MANUAL_CACHE_CLEAR" /> 

</ option> 



© 



coption name = " CACHE_TYPE " value = " DATABASE_CACHE " > 
coption name = "WRITABLE" value = "TRUE"> 
cadapt frame = " DB_UPDATE_SCHEME " / > 
c/option> 
c/option> 



© 



© 



Fig. 10. Composition rules for generic cache 



5 Related Work 

Template programming, as used in languages such as C++, is a means of creating 
generic classes (or functions) and allowing customisations to be made based on 
parameters provided by the programmer when they instantiate instances of the 
required class. However, templates are not supported within languages such as Java 
and C# and therefore the vast majority of AOP languages. Generics are a new 
addition to the Java language (and also C#) and, like templates, allow the programmer 
to instantiate customised classes. However, aspect languages using AspectJ like 
instantiation models cannot take advantage of generic support due to the fact that 
aspects, unlike classes, are not directly instantiated. Moreover, in our opinion, AOP 
really needs a different generic model to the aforementioned due to the fact that the 
programmer may want to apply parameterisation to, for instance, supply a method 
name as a parameter for use in a pointcut definition or advice. In this respect our 
approach offers the only current way to generalise aspects. 

Our approach also shares many similarities with feature oriented programming (FOP, 
GenVoca et al) [6], where modules are created as a series of layered refinements, 
SALLY [7], which allows AspectJ style introductions to be parameterised and 
Aspectual Collaborations [8] where modular programming and AOP techniques are 
combined. In FOP layers are stacked upon one another, with each layer containing 
any number of classes. Upper layers add refinements (new functionality, methods, 
classes, etc.) to lower layers by parameterisation. There are commonalities between 
FOP, AOP and in particular our framed aspect approach. However, presently, only 
static crosscutting (introductions) is currently supported within the FOP model, in 
contrast to the power of dynamic crosscuts via advice in our framed aspect model. 
Our approach brings a FOP style variability to AOP hence, facilitating potential for 
AOP implementation of product lines. In contrast to SALLY, which allows 
introductions to be parameterised, framed aspects allow any AOP construct (pointcut, 
advice, introduction, members, etc.) or technique (Hyper/J [9], AspectJ, etc.), to be 
parameterised. Framed aspects are in some ways similar to Aspectual Collaborations 
as they help build aspects for black-box reuse and also support external composition 
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and binding between aspects and base. Recent developments focus around AOP 
frameworks such as AspectWerkz [10], JBoss AOP [11] and Nanning Aspects [12]. 
AspectWerkz allows for parameterisation and uses XML for defining aspects, advice 
and introductions. JBoss AOP also uses XML for defining advice, introductions and 
pointcuts but lacks a comprehensive join point model. Nanning has been designed as 
a simple to use AOP framework but, like JBoss AOP, also lacks a rich joinpoint 
model. The key difference in our approach, compared with any of the models 
previously mentioned, is in the language independence of frames, which means that 
there are no constraints as to which AOP technique or programming language 
platform (e.g. Java, C# etc.) is used. 



6 Conclusions 

This paper has demonstrated how AOP can benefit from parameterisation, generation 
and generalisation that frame technology brings. We have demonstrated how frames 
can enhance reuse and ease the integration and creation of new features and believe 
the same technique can be applied to different concerns. Framed aspects improve 
upon traditional framing methods by removing a great deal of the meta code that 
frames suffer from and allow crosscutting features to be modularised in a non- 
invasive manner. We can utilise the technique in the creation of reusable component 
libraries or domains which require high levels of reuse, such as product line 
engineering [13], which can benefit from the parameterisation and configurational 
power that framed aspects can bring. Framed aspects improve the integration of 
features that would normally crosscut multiple modules, thus causing severe problems 
with evolution, and resulting in architectural erosion [14]. The technique can also be 
utilised to allow configuration of reusable aspects that can be woven into existing 
systems where the original code may or may not be available, thus allowing frame 
techniques to be used in legacy systems to some degree. 

In this paper we demonstrated how framed aspects could be used with production 
aspects (code included in the final product) in order to increase their reusability. 
However, we also have found that development aspects (e.g. tracing and testing 
aspects) can benefit from framing in order for them to be reused in other contexts or 
across different domains. Future work will involve improving the framing technique 
by adding semantic checks and more static type checking, utilising IDE support and 
also demonstrate how framed aspects can be used within existing frame based 
technologies such as XVCL. Java 1.5 will bring generics as standard and it will be 
interesting to view in more detail how this will contrast with the parameterisation 
support available in framed aspects. Due to language independence, the framed aspect 
technique can be used in combination with new AOP languages as they emerge and 
also with existing techniques as they evolve. Utilisation of frames and AOP allows 
features and concerns to be modularised and adapted to different reuse contexts thus 
improving comprehensibility and improving the possibilities for evolution. 
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Abstract. A systematic approach for implementing software product lines is 
more than just a selection of techniques. Its selection should be based on a sys- 
tematic analysis of technical requirements and constraints, as well as of the 
types of variabilities, which occur in a particular application domain and are 
relevant for the planned product line (PL). In addition, each technique should 
provide a set of guidelines and criteria that support developers in applying the 
techniques in a systematic and unified way. This paper presents a case study 
that was performed to evaluate aspect-oriented programming (AOP) as a PL 
implementation technology. The systematical evaluation is organized along a 
general evaluation schema for PL implementation technologies. 



1 Introduction 

Software development today must meet various demands, such as reducing cost, 
effort, and time-to-market, increasing quality, handling complexity and product size, 
or satisfying the needs of individual customers. Therefore, many software organiza- 
tions realize problems when developing and maintaining a set of separate software 
systems, and thus do or plan to migrate to a product line approach. A software prod- 
uct line is thereby a family of products designed to take advantage of their common 
aspects and predicted variability [1], The members of a product line are typically 
systems in the same application domain; their independent development and mainte- 
nance usually require redundant and effort- intensive activities. 

An approach for systematically developing and maintaining product lines is PuL- 
SE™ (Product Line Software Engineering) 1 , which is used in technology transfer 
projects since 1998 [2]. The core of such an approach is the creation of a product line 
infrastructure that enables the efficient production of customer-specific products by 
explicitly managing their commonalities and variabilities. 

While there has been significant effort spent in the product line community to sys- 
tematize the early steps of product line engineering (PLE), that is scope definition, 



1 PuLSE is a trademark of the Fraunhofer Institute for Experimental Software Engineering 
(IESE) 

J. Bosch and C. Krueger (Eds.): ICSR 2004, LNCS 3107, pp. 141-156, 2004. 
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domain and feature modeling and architectural design, less attention has been paid to 
the implementation level. There are, of course, many technologies available for im- 
plementing generic components but they are not well integrated with PLE. In other 
words, they target interested programmers but they lack in supporting organizations 
in systematically migrating to PLE. The latter could be achieved by collecting experi- 
ence and knowledge in how implementation technologies performed in different 
contexts. The motivation for the PoLITe (Product Line Implementation Technologies) 
project [3] was to start the collection of experience and also to characterize and clas- 
sify implementation technologies. 

A systematic method for implementing software product lines is more than just a 
selection of techniques. For example, the selection must be based on a systematic 
analysis of technical requirements and constraints, as well as of the types of variabil- 
ities, which occur in a particular application domain and are relevant for the planned 
product line. In addition, each technique must provide a set of guidelines and criteria 
that support developers in applying the techniques in a systematic and unified way. 

Conditional compilation [7], frames [8], template and generative programming [9] 
are examples of such techniques, as well as aspect-oriented programming (AOP) [10]. 

This paper presents a case study that was performed in order to evaluate AOP as a 
product line implementation technology. The study has contributed to the ViSEK 
portal [4], which aims at providing empirically validated knowledge in the general 
field of software engineering. 

In section 2 a general evaluation schema for product line implementation tech- 
nologies is presented. Section 3 then presents the case study applying AOP in the 
context of the larger GoPhone product line. Section 4 analyzes the results of the case 
study in combination with AOP and according to the schema presented before. Re- 
lated work is discussed in section 4.8. Finally, section 6 concludes the paper by giv- 
ing a brief outlook on future work planned. 



2 Evaluation Schema 

Software product lines pose special requirements on implementation technologies, 
which we divide into organizational and technical requirements. This paper, however, 
focuses on technical characteristics only. To technically evaluate product line imple- 
mentation technologies, the contexts of both main activities in product line engineer- 
ing, namely framework and application engineering, must be separately considered: 
the role of a technology while implementing a product line during framework engi- 
neering is significantly different from using a product line implementation while 
creating particular products during application engineering. 

The following table provides an overview of the technical requirements for imple- 
mentation technologies in terms of factors that influence framework and application 
engineering efforts. These factors are further analyzed in the subsequent paragraphs. 
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Table 1 . Product line activities and associated requirements on implementation technologies 



Activity 


Effort 


Factor 


Framework 

Engineering 


Implementing 
reusable code 


Effort for making code reusable 
across the product line (development 
for reuse) 


Reuse techniques 


Variation types 


Granularity levels 


Effort for testing reusable code 


Testability 


Reacting to 
evolutionary 
change 


Effort for integrating system-specific 
code into the product line 


Component inte- 
gration impact 


Effort for adding and removing 
variations (variability management) 


Automation 


Maintenance effort 


Reuse techniques 


Application 

Engineering 


Reusing 

code 


Effort for reusing code to derive a 
concrete product (development with 
reuse) 


Reuse techniques 


Resolving 

variations 


Effort for creating a concrete product 
line member 


Binding time 


Automation 



2.1 Reuse Techniques 

Establishing strategic reuse is for most software product lines one of the main objec- 
tives. The difference between strategic and opportunistic reuse lies in the planning 
and realization of reusable artifacts with respect to anticipated reuse contexts in the 
future. The planning is thereby based on careful analyses of customer needs and tech- 
nology trends. The goal is to identify features that should be made reusable across the 
whole product line, as well as features that are likely to arise as a result of software 
evolution. 

2.1.1 Reuse Across Product Line Members 

Flexibility is the major concern in development for reuse. The implementation tech- 
nology must provide means for making code flexible so that it can reflect the various 
product requirements. This flexibility is usually achieved through generalization and 
decomposition. In [11] the two possibilities are called data-controlled variation and 
module replacement, respectively. 

With generalization the goal is to create generic code that encloses many varia- 
tions, while decomposition separates common from variant features so that on de- 
mand the latter can be attached to the common functionality. 

Both approaches have well known advantages and disadvantages. Table 2 charac- 
terizes generalization and decomposition in terms of the effort that must be carried for 
framework and application engineering respectively. 
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Table 2. Characterizing generalization and decomposition 



Approach 


Activity 


Effort for 


Generalization 


Framework 

Engineering 


Making code abstract 


Application 

Engineering 


Specializing code 


Decomposition 


Framework 

Engineering 


Separating common from variant functionality 


Application 

Engineering 


Integrating common and variant functionality 



2.1.2 Reuse over Time 

Another reuse dimension is reuse over time [24], which affects both framework and 
application engineering. However, the latter refers to single-system evolution and is 
thus not considered here. Subsection 2.5 describes below issues arising when single- 
system evolution affects the product line. McGregor [12] distinguishes between pro- 
active evolution techniques dealing with anticipated evolution, reactive techniques 
dealing with evolutionary change, and techniques automating framework configura- 
tion (see subsection 2.7 for more details). 

Anticipated evolution is the only kind of evolution that enables a controlled man- 
agement of activities since effort is spent for planning and identifying possible future 
changes and for preparing the system to accommodate the evolution. Nevertheless, it 
is practically impossible to foresee all future changes and predict the effort required 
for realizing them. Hence, unanticipated evolution cannot be fully avoided in prac- 
tice. An ideal implementation technique thus supports both types of evolution. 



2.2 Variation Types 

At the implementation level there are two main types of variability that have to be 
supported: positive and negative variability (these terms have been initially intro- 
duced in [13]). 

In the first case functionality is added for creating a product line member while in 
the second case functionality is removed. The degree to which an implementation 
technique can handle different variation types is directly related to the effort required 
for development and reuse. 

Negative variability is typically realized by generalization techniques. Generic 
components typically contain more functionality than actually needed in a specific 
product and for that reason functionality has to be removed. Positive variability is 
typically handled by decomposition techniques. In this case optional functionality is 
separated from the core and therefore it must be integrated, later, during application 
engineering. 

Another issue in this context is the order in which variations are bound; an imple- 
mentation technology should allow the programmer to define this order explicitly. 
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2.3 Granularity 

Product line variability may exist at different levels of granularity ranging from entire 
components down to single lines of code. The levels of granularity that can be man- 
aged by a technology is an important characteristic of implementation approaches. If 
a product line architecture defines variation points that cannot be realized by a given 
implementation technology, restructuring the system may be necessary. For that rea- 
son, an implementation approach must be flexible and support various levels of 
granularity so that misalignments with variation defined in architecture or design is 
avoided. 

When we use the Unified Modeling Language (UML) for describing product line 
architectures, its provided extension mechanisms enable a way of modeling variation 
points explicitly. For example, the stereotype «variant» may be attached to variant 
model elements representing variability as defined in [14]. This leads us to the identi- 
fication of granularity levels shown in the following table. These levels can be further 
refined in adherence to the UML notation guide but this is beyond the scope of this 
paper. 



Table 3. Granularity levels according to the UML 



Classifiers 


Relationships 


Features 


Procedures 


Package 
Class 
Interface 
Data type 
Component 


Generalization 

Association 

Dependency 


Attributes 

Operations 

Methods 


Read-Write Actions 
Computation Actions 
Collection Actions 
Messaging Actions 
Jump Actions 
Composite Actions 

• Group Actions 

• Loop Actions 

• Conditional Actions 



2.4 Testability 

Reuse and testing are tightly coupled because reusable code is thought of as quality- 
assured and this can be reached only through systematic testing. In a product line 
context there are special issues that have to be considered during testing [ 15]. 

As described in [16] code is testable when its behavior is observable and controlla- 
ble at the code level. The extent to which these properties are reached depends on the 
implementation technology At this point the technology can be judged against the 
means it provides for creating testable as well as test code (i.e. code the performs 
tests). 



146 M. Anastasopoulos and D. Muthig 



2.5 Integration Impact 

During the evolution of a product line member, functionality is added, which at a 
point in time may become beneficial for other members as well. In this case the 
member-specific functionality must be integrated into the product line infrastructure. 
The same can happen with externally acquired components (e.g. COTS). The integra- 
tion impact must be controllable, which poses an important requirement to the im- 
plementation technology used for creating the system- specific functionality. 



2.6 Binding Time 

Binding time refers to the point in time when decisions are bound for the variations, 
after which the behavior of the final software product is fully specified [18]. The 
binding time influences the effort for application engineering and may restrict the 
selection of the implementation technology [19]. The following figure illustrates 
typical binding times in the lifecycle of an application. 




Fig. 1 . Binding times in the life cycle of an application 



2.7 Automation 

Software product line engineering requires automation support both for framework 
and application engineering. Automation can reduce framework-engineering effort by 
automating the variability management. The latter includes adding and removing 
variations by taking compatibility and configuration knowledge [20] into account. 
During application engineering automation can also play an important role since it 
may simplify the creation of product line members by supporting the resolution of 
variation, for example, by decision models, which are models capturing the decisions 
to be taken while reusing generic product line artifacts. 

The existence of decision models lies more in the responsibility of the general 
product line approach than of the implementation technology. The latter, however, 
also influences the overall achievable level of automation. Integrability of an imple- 
mentation technology into a product line infrastructure is, therefore, also an important 
characteristic. 
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3 Case Study 

The case study presented is based on a hypothetical mobile phone company, the Go- 
Phone Inc. The study is meant to illustrate (a part of) a mobile phone software prod- 
uct line in a realistic way 2 . 

It happens often that features identified in the early steps of development cannot be 
directly mapped to units at the code level. This problem is also known as scattering 
[21] and becomes apparent when features are orthogonal to the rest of the system and 
so affect at the same time a large amount of other functionality. This happens typi- 
cally with non-functional features. Aspect-oriented programming (AOP) has been 
conceived to match this situation. 

AOP is a technology that has to be considered in product line engineering because 
it enhances, on the one hand, reuse of crosscutting features, and on the other hand, 
supports unanticipated evolution, which according to [24] is one of the fields where 
variability realization techniques fail to support evolution. 

AOP distinguishes between components and aspects [22]: Components represent 
properties that can be encapsulated cleanly; aspects represent properties that cannot 
be encapsulated. Hence, aspects and components crosscut each other in a system’s 
implementation. 

Further concepts of AOP used in this paper are (for more details see [23]): 

• Decomposition: At development-time, AOP decomposes an implementation into 
aspects or components. The aspect weaver is responsible for integrating aspects 
with components later. 

• Development and Production Aspects: AOP is generally be used for two purposes. 
First, it “only” facilitates application development by handling functionality sup- 
porting development tasks like debugging, performance tuning or testing. Second, 
AOP is used to realize “real” functionality that is part of a released product as we 
did in the GoPhone case study (see below). 

• Pointcut, advice and inter-type declarations: Pointcuts use type patterns for cap- 
turing points of concern (joinpoints) in the execution of a program. Advices con- 
tain the aspect code that is then introduced at these points. Finally inter-type decla- 
rations enable the modification of the static structure of a program. 

In the case study, we selected the Java 2 Micro Edition (J2ME) as implementation 
technology despite the fact that, in practice, mobile phones are programmed in C. The 
motivation was to abstract from hardware-specific details and thus make the product- 
line-specific issues more visible and easier to understand. As a side effect, J2ME’s 
phone emulators realistically visualize the running software. 

The programming language of the GoPhone case study is Java and for that reason 
we decided to use AspectJ [23], which nicely integrates with development environ- 
ments like Eclipse [25]. For Eclipse, it provides a plug-in with an aspect editor and 
views for managing crosscuts. Additionally, we have employed Ant [26] for auto- 



2 The documentation and source code of the case study is available at www.software- 
kompetenz.de. 
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mating the build process. Ant is an XML-based make tool, which is also well inte- 
grated with Eclipse. 



3.1 Realizing the Optional T9 Feature 

The input of text with a mobile phone is typically done in a rather uncomfortable 
way, namely with multi-tapping. This means that the user has to repeatedly press a 
button until he gets the right letter. T9 is an optional auto-completion feature that 
accelerates text input by predicting the words that the user wants to type [27], 

The implementation of the optional T9 feature as an aspect has been realized by 
picking up all constructor calls to standard text fields, which enable text editing, and 
replacing them with constructor calls to an extended text field with T9 capabilities. 
The effort was considerably kept small by the fact that the T9 class inherits from the 
standard text field. So, extended methods are called automatically through polymor- 
phism. Implementing a custom field that according to J2ME would not extend the text 
field class is also possible. In this case changing the static structure of the base 
through inter-type declarations, as well as advising at many additional pointcuts 
would be necessary. 

In general, the implementation and management of the T9 feature and its optional- 
ity could be realized with the selected technology. The next section will analyze the 
case study in detail according to the schema presented above. 



4 Analysis 



4.1 Reuse Techniques 

4.1.1 Reuse Across Product Line Members 

AspectJ uses decomposition at development-time to separate components from as- 
pects. However the decomposition is resolved during compilation, when aspect code 
is merged (woven) with component code. 

The effort for application engineering, that is for integrating common and variant 
functionality is reduced to the minimum because the aspect weaver does this auto- 
matically. So, significant effort is found only during framework engineering. 

The first step of isolating optional or alternative functionality into aspects is the 
identification of the variation points. This usually does not require much effort under 
the condition that the variability requirements are well understood and the design or 
code documentation is adequate. If this is not the case, aspect mining techniques (e.g. 
[17]) can help in that direction. The next step is to write the aspect code that will be 
introduced at the variation points, or in other words the joinpoints, found before. The 
effort required for this step depends on the granularity level and the type of the varia- 
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tion. As we will see later fine-grained variation inside of a method body is hard to 
deal with because AspectJ works basically at method and field level. 

Another effort factor at this point is the question whether the base hierarchy can be 
maintained (see section 3.1 above). In other words if an aspect needs to break the 
class hierarchy at a joinpoint the effort may grow considerably. 

4.1.2 Reuse over Time 

Aspect-oriented programming in general accommodates both types of evolution. 
Development for evolution (proactive) is mainly supported through the enhanced 
modularity, which comes with aspect-orientation. Ivar Jacobson [28] shows that the 
progressive change, which is inherent in every software development process, can be 
handled proactively with AOP. On the other hand AspectJ is an event-based mecha- 
nism as opposed to collaboration-based mechanisms that support preplanning by 
nature [29]. 

Reacting to evolution, as it arises, can be supported by the non-invasive composi- 
tion, which is provided by AOP and particularly by AspectJ. For example, if a new 
crosscutting feature is required, developers add only the corresponding aspect code 
without changing the base system. 

AOP can be seen as a transformation technique, which inherently is reactive [12]. 
Yet predicting the exact effects of AOP transformations can be a challenging task. 



4.2 Variation Types 

AspectJ can support both types of variability although it is better suited for positive 
variability. Advice code is introduced before, after or instead of component code. 
Negative variability can be achieved at the method-level by replacing with less func- 
tionality. However AspectJ does not remove the original code from the resulting 
bytecodes. The original code will not be executed but it is included in the built prod- 
uct (it is even possible to call this code through reflection). 

So, the suitability of AspectJ for negative variability depends on the reason why 
functionality must be removed. If for example some features must simply be disabled 
to provide a non-commercial version of a product AspectJ can be helpful. On the 
other hand if features have to be removed in order to run the product on a resource- 
constrained device other variability mechanisms should be considered. 

As far as the order of variation binding is concerned, there are two ways that as- 
pects can declare precedence. Either through the order in which aspect files are 
passed to the weaver or more effectively through special precedence declaration 
forms in the aspect code. 



4.3 Granularity 

AspectJ’ s basic primitive pointcuts are method and field-related. Attributes, Opera- 
tions and Methods can be tailored easily and in a non-invasive way. AspectJ can alter 
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the static structure of components by introducing new attributes and operations but it 
can also change the behavior of existing operations including read and write accesses 
to attributes. 

Relationship, class, interface and data type variability can also be realized with 
AspectJ through inter-type declarations or by dynamically introducing new method 
calls in advice bodies. 

Things get more complex with actions. In our case study we faced this problem es- 
pecially with jump, loop and conditional actions. There is no pointcut in AspectJ that 
can capture a variation at this level. The same applies to loop and conditional actions. 
There is no way we can tailor the expressions that control an iteration or a conditional 
block. For such kind of variations other techniques like Frames are better suited [30]. 



4.4 Testability 

Testability is one of the open issues with AOP [31]. Aspect code always depends on 
the component code and is therefore difficult to test in isolation. Moreover, AspectJ 
weaves at the binary level and thus makes the source of the woven code unavailable. 
Thus structural testing of woven code becomes difficult. To this comes the problem 
of aspect correctness. Making sure that an aspect will work correctly upon reuse is 
another open issue [32]. 

Although code residing in aspects may be not easy to test, it can be used for testing 
existing code. Development aspects are a good way for assuring contract enforcement 
and tracing program execution or for injecting faults into a system and observing its 
reaction. As much as product line testing is concerned, the creation of generic test 
cases based on AOP is conceivable. 



4.5 Integration Impact 

The impact of integrating aspect code into an existing system depends on the type 
patterns used. In other word pointcuts or inter-type declarations that use wildcards 
impose a great impact. AspectJ provides however tools that show the affected com- 
ponents and so enable a better overview and control of the impact. 

Another problem with the integration of aspect code lies in the conflicts that can 
occur. This happens especially when inter-type declarations come into play. An as- 
pect cannot declare a public inter-type member that already exists as a local class 
member. 



4.6 Binding Time 

AspectJ weaves code at the bytecode level. That means that only the aspect code 
needs to be compiled upon weaving. AspectJ compiles and merges the aspect code 
directly with the existing binaries. However the component code must be recompiled 
when woven aspect code must be removed (negative variability). 
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4.7 Automation 

In this work we used the AspectJ and Ant plug-ins for Eclipse to support the decision 
making process when deriving products. For configuring a product, the AspectJ plug- 
in provides an editor, which enables defining different configurations by simply 
checking the boxes of the modules required. Per default, the editor unfortunately 
provides check boxes for all files (see Fig. 2). For that reason we employed Ant to- 
wards a more effective way for packaging configurations. 
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Fig. 2. T9 Build Configuration (excerpt) 

The targets shown in the following picture reflect different product line features. 
The core target represents the base component code while the other two targets repre- 
sent optional features. The Ant script behind this dialog box interacts with the As- 
pectJ compiler in order to build a product containing the selected features. 

Adding a new feature means therefore adding a new target in the Ant script. The 
according effort is kept small because the only difference from existing targets lies in 
the different aspect files that will be given to the AspectJ compiler. That means that a 
target template can be reused each time a new aspect is added. 

Ant can also be used to support conditions as well as dependencies between fea- 
tures. The following picture shows the output of the build process when an invalid 
feature combination has been selected. As shown in the picture, the optional function- 
ality is being compiled and then woven with the already compiled core. 

At this point it must be noted that this kind of automation is a simple script-based 
solution and for complex scenarios probably not sufficient. It can however be com- 
bined with sophisticated configuration management environments like GEARS [36]. 
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Fig. 3. Automating the decision making process 




Fig. 4. Supporting Feature Dependencies 



4.8 Summary 

The following table summarizes the evaluation results according to the schema of 
Table 1. 



5 Related Work 

The evaluation of product line implementation technologies has received attention in 
the research community [19, 24, 34], Yet it is not clear how different technology 
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Table 4. Evaluation summary 



Activity 


Effort 


AspectJ 


Framework 

Engineering 


Implementing 
reusable code 


Effort for making code 
reusable across the product 
line (development for reuse) 


Effort increases with: 

■ Fine granularity 

■ Break of base hierar- 
chy 


Effort for testing reusable 
code 


Aspect testability is a 
well-known drawback 


Reacting to 
evolutionary 
change 


Effort for integrating sys- 
tem-specific code into the 
product line 


Impact increases with: 

■ Type pattern gen- 
ericity 

■ Introduction assump- 
tions 


Effort for adding and re- 
moving variations (vari- 
ability management) 


Eclipse integration 
contributes to automa- 
tion support 


Maintenance effort 


Less effort in the main- 
tenance of crosscutting 
features 


Application 

Engineering 


Reusing 

code 


Effort for reusing code to 
derive a concrete product 
(development with reuse) 


Effort is normally kept 
small 


Resolving 

variations 


Effort for creating a con- 
crete product line member 


■ Less effort for com- 
pile-time binding 

■ Decision support 

through configuration 
builder or Ant-based 
automation 



dimensions such as configuration management, component-orientation and program- 
ming language techniques can be combined for efficiently implementing product 
lines. This constitutes the major goal of the PoLITe project [3]. 

On the other hand the specific selection and evaluation of AOP as a product line 
implementation technology has been the focus of a few papers: 

Lopez-Herrejon and Don Batory [33] used AspectJ to implement a simple family 
of graph applications. The authors face the problem of illegal feature compositions 
and discuss that AspectJ does not support the developer in this direction. Indeed, 
AspectJ does not provide any explicit support, except of its aspect prioritization fea- 
ture that may help in this situation. However as shown in section 4.7 we can define 
such rules outside AspectJ, for example in ant scripts. Moreover, this issue can be 
approached with special-purpose aspect code [35], 

The next problem encountered by the authors relates to the management of the 
files passed to the weaver. In the graph product line all features are aspects and thus 
deciding which ones are passed to the weaver is an error-prone task. Yet this issue 
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can be handled with ant scripts, as we demonstrated in section 4.7, or with more so- 
phisticated tools like GEARS [36]. 

Beuche and Spinczyk [37] demonstrate how AspectC++ (the counterpart of As- 
pectJ for C++) can be employed in a weather station product line. The use of As- 
pectC++ simplified the development process since crosscutting product line features 
could be directly mapped to aspects. Moreover, a major issue of the weather station 
product line was the decision of the level, at which hardware interrupts are controlled. 
Aspect C++ enabled configuring the appropriate level and therefore enhanced the 
code reusability for various application needs. 

Finally, Sven Macke in his diploma thesis [38] uses AspectJ as a generative tech- 
nique for the implementation of a search engine. One of the major problems encoun- 
tered was the restriction that aspect code can only be woven with component code 
that resides in source files. The author missed the possibility of weaving with code 
residing in a database as a stored procedure. This led to the important conclusion that 
configuration knowledge and therefore automation support cannot be fully handled 
by using only AspectJ. 



6 Conclusion 

The case study has shown that AOP is especially suitable for variability across several 
components. Whether AOP may be suitable for variability of different kind and inside 
of single components will be subject of future work. It will also explore the question 
whether alternative implementation technologies (e.g. frames) are available that han- 
dle some variability better. 

In parallel the integration of AOP into general product line approaches should be 
improved to further enhance the achievable automation level. 
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Abstract. In component-based product populations, variability has to 
be described at the component level to be able to benefit from a product 
family approach. As a consequence, composition of components becomes 
very complex. We describe how this complexity can be managed au- 
tomatically. The concepts and techniques presented are the first step 
toward automated management of variability for web-based software de- 
livery. 



1 Introduction 

Variability [13] is often considered at the level of one software product. In a 
product family approach different variants of one product are derived from a set 
of core assets. However, in component-based product populations [14] there is 
no single product: each component may be the entry-point for a certain software 
product (obtained through component composition). 

To let this kind of software products benefit from the product family ap- 
proach, we present formal component descriptions to express component vari- 
ability. To manage the ensuing complexity of configuration and component com- 
position, we present techniques to verify the consistency of these descriptions, 
so that the conditions for correct component composition are guaranteed. 

This paper is structured as follows. In Sect. 2 we first discuss component- 
based product populations and why variability at the component-level is needed. 
Secondly, we propose a Software Knowledge Base (SKB) concept to provide some 
context to our work. We describe the requirements for a SKB and which kind 
of facts it is supposed to store. Section 3 is devoted to exploring the interaction 
of component-level variability with context dependencies. Section 4 presents the 
domain specific language CDL for the description of components with support 
for component-level variability. CDL will serve as a vehicle for the technical 
exposition of Sect. 5. The techniques in that section implement the consistency 
requirements that were identified in Sect. 2. Finally, we provide some concluding 
remarks and our future work. 
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2 Towards Automated Management of Variability 

Why Component Variability? Software components are units of indepen- 
dent production, acquisition, and deployment [9]. In a product family approach, 
different variants of one system are derived by combining components in differ- 
ent ways. In a component-based product population the notion of one system is 
absent. Many, if not all, components are released as individual products. To be 
able to gain from the product family approach in terms of reuse, variability must 
be interpreted as a component-level concept. This is motivated by two reasons: 

— In component-based product populations no distinction is made between 
component and product. 

— Components as unit of variation are not enough to realize all kinds of con- 
ceivable variability. 

An example may further clarify why component variability is useful in product 
populations. Consider a component for representing syntax trees, called Tree. 
Tree has a number of features that can optionally be enabled. For instance, the 
component can be optimized according to specific requirements. If small memory 
footprint is a requirement, Tree can be configured to employ hash-consing to 
share equal subtrees. Following good design practices, this feature is factored 
out in a separate component, Sharing, which can be reused for objects other 
than syntax trees. Similarly, there is a component Traversal which implements 
generic algorithms for traversing tree-like data structures. Another feature might 
be the logging of debug information. 

The first point to note, is that the components Traversal and Sharing 
are products in their own right since they can be used outside the scope of 
Tree. Nevertheless they are required for the operation of Tree depending on 
which variant of Tree is selected. Also, both Traversal and Sharing may have 
variable features in the very same way. 

The second reason for component variability is that not all features of Tree 
can be factored out in component units. For example, the optional logging feature 
is strictly local to Tree and cannot be bound by composition. 

The example shows that the variability of a component may have a close 
relation to component dependencies, and that each component may represent a 
whole family of (sub) systems. 



The Software Knowledge Base. The techniques presented in this paper 
are embedded in the context of an effort to automate component-based software 
delivery for product families, using a Software Knowledge Base (SKB). This SKB 
should enable the web-based configuration, delivery and upgrading of software. 
Since each customer may have her own specific set of requirements, the notion 
of variability plays a crucial role here. 

The SKB is supposed to contain all relevant facts about all software compo- 
nents available in the population and the dependencies between them. Since we 
want to keep the possibility that components be configured before delivery, the 
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SKB is required to represent their variability. To raise the level of automation 
we want to explore the possibility of generating configuration related artifacts 
from the SKB: 

Configurators. Since customers have to configure the product they acquire, 
some kind of user interface is needed as a means of communication between 
customer and SKB. The output of a configurator is a selection of features. 
Suites. To effectively deliver product instantiations to customers, the SKB is 
used to bundle a configured component together with all its dependencies in 
a configuration suite that is suitable for deployment. The configuration suite 
represents an abstraction of component composition. 

Crucial to the realization of these goals is the consistency of the delivered configu- 
rations. Since components are composed into configuration suites before delivery, 
it is necessary to characterize the relation between component variability and 
dependencies. 



3 Degrees of Component Variability 



A component may depend on other components. Such a client component re- 
quires the presence of another component or some variant thereof. A precondition 
for correct composition of components is that a dependent component supports 
the features that are required by the client component. Figure 1 depicts three 
possibilities of relating component variability and composition. 

The first case 
is when there is no 
variability at all. A 
component C a requires 
components C \ , . . . , C n . 

The component de- 
pendencies Ci , . . . , C n 
should just be present 
somehow for the cor- 
rect operation of C a . 

The resulting system 
is the composition 
of C a and C 1; ..., C„, 
and all component 
dependencies that are 
transitively reachable 
from Ci, ...,C n . 

Figure 1 (b) and (c) show the case that all components have configuration 
interfaces in the form of feature diagrams [4] (the triangles). These feature di- 
agrams express the components’ variability. The stacked boxes indicate that a 
component can be instantiated to different variants. The shaded triangles in- 
dicate that Cb and C c depend on specific variants of C\, ...,C n . Features that 




(a) 



(b) 



(c) 



Fig. 1. Degrees of component variability 
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remain to be selected by customers thus are local to the chosen top component 
(C b resp. C c ). 

The component dependencies of C b are still fixed. For component C c however, 
the component dependencies have become variable themselves: they depend on 
the selection of features described in the configuration interface of C c . This allows 
components to be the units of variation. A consequence might be, for example, 
that when a client enables feature a, C c requires component A. However, if 
feature b would have been enabled, C c would depend on B. The set of constituent 
components of the resulting system may differ, according to the selected variant 
of C c . 

When composing C a into a configuration suite, components C \, .... C n just 
have to be included. Components with variability, however, should be assembled 
into a suite guided by a valid selection of features declared by the top component 
(the component initially selected by the customer) . Clients, both customers and 
requiring components, must select sets of features that are consistent with the 
feature diagram of the requested component. 

How to establish these conditions automatically is deferred until after Sect. 4, 
where we introduce a domain specific language for describing components with 
variability. 



4 Component Description Language 

To formally evaluate component composition in the presence of variability, a 
language is needed to express the component variability described in Sect. 3. 
For this, Component Description Language (CDL) is presented. This language 
was designed primarily for the sake of exposition; the techniques presented here 
could just as well be used in the context of existing languages. The language 
will serve as a vehicle for the evaluation of the situation in Fig. 1 (c), that is: 
component dependencies may depend themselves on feature selections. 

For the sake of illustration, we use the ATerm library as an example compo- 
nent. The ATerm library is a generic library for a tree like data structure, called 
Annotated Term (ATerm). It is used to represent (abstract) syntax trees in the 
Asf+Sdf Meta-Environment [5], and it in many ways resembles the aforemen- 
tioned Tree component. The library exists in both Java and C implementations. 
We have elicited some variable features from the Java implementation. The com- 
ponent description for the Java version is listed in Fig. 2. 

A component description is identified by a name (aterm-java) and a version 
(1.3.2). Next to the identification part, CDL descriptions consist of two sections: 
the features section and the requires section. 

The features section has a syntax similar to Feature Description Language 
(FDL) as introduced in [12]. FDL is used since it is easier to automatically ma- 
nipulate than visual diagrams due to its textual nature. The features section 
contains definitions of composite features starting with uppercase letters. Com- 
posite features obtain their meaning from feature expressions that indicate how 
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ATerm 

Nature 

Sharing 

Export 



component description (“aterm-java”, 
features 

all(Nature, Sharing, Export, 
one-of (native, pure) 
one-of (nosharing, sharing) 
more-of(sharedtext, text) 
sharedtext requires sharing 
requires 
when sharing { 

( “shared-objects’ 1 

} 

when visitors { 

(“JJTraveler”, “0.4.2”) 

} 

Fig. 2. Description of aterm-java 



“1.3.2”) 

visitors?) 



“1.3”) with fasthash 



sub-features are composed into composite features. Atomic features can not be 
decomposed and start with a lowercase letter. 

The ATerm component exists in two implementations: a native one (imple- 
mented using the Java Native Interface, JNI), and a pure one (implemented 
in plain Java). The composite feature Nature makes this choice explicit to 
clients of this component. The feature obtains its meaning from the expres- 
sion one-of (native , pure). It indicates that either native or pure may be 
selected for the variable feature Nature, but not both. Both native and pure 
are atomic features. Other 
variable features of the 
ATerm-library are the use of 
maximal sub-term sharing 
(Sharing) and an inclusive 
choice of some export for- 
mats (Export). Additional 
constraints can be used to 
reduce the feature space. For 
example, the sharedtext fea- 
ture enables the serialization 
of ATerms, so that ATerms 
can be written on file while 
retaining maximal sharing. 

Obviously, this feature re- 
quires the sharing feature. 

Therefore, the features sec- 
tion contains the constraint that sharedtext cannot be enabled without 
enabling sharing. 

The requires section contains component dependencies. A novel aspect of 
CDL is that these dependencies may be guarded by atomic features to state 
that they fire when a particular feature is enabled. These dependencies are condi- 
tional dependencies. They enable the specification of variable features for which 
components themselves are the unit of variation. 

As an example, consider the conditional dependency on the shared-ob- 
jects component which implements maximal sub-term sharing for tree- like ob- 
jects. If the sharing feature is enabled, the ATerm component requires the 
shared-objects component. As a result, it will be included in the configura- 
tion suite. Note that elements of the requires section refer to variants of the 
required components. This means that component dependencies are configured 
in the same way as customers would configure a component. Configuration oc- 
curs by way of passing a list of atomic features to the required component. In the 
example this happens for the shared-objects dependency, where the variant 
containing optimized hash functions is chosen. 
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5 Guaranteeing Consistency 

Since a configuration interface is formulated in FDL, we need a way to rep- 
resent FDL feature descriptions in the SKB. Our prototype SKB is based on 
the calculus of binary relations, following [6]. The next paragraphs are therefore 
devoted to describing how feature diagrams can be translated to relations, and 
how querying can be applied to check configurations and obtain the required set 
of dependencies. 



Transformation to Relations. The first step proceeds through three interme- 
diate steps. First of all, the feature definitions in the features section are inlined. 
This is achieved by replacing every reference to a composite feature with its 
definition, starting at the top of the diagram. For our example configuration 
interface, the result is the following feature expression: 

all (one-of (native, pure), one-of (nosharing, sharing), 
more-of (sharedtext , text) .visitors?) 

The second transformation maps this feature expression and additional con- 
straints to a logical proposition, by applying the following correspondences: 

all (/l> •••> fn ) e-*- Aie{l,...,n} fi 

more-of (/i, ..., f n ) V; e {i,...,„} fi 
one-of (A, ..., f n ) ^ \/i 6 {i,. ..,„}(/< A -, '(Vje{i,..,i-i,i+i,...,n} fi)) 

Optional features reduce to T. Atomic features are mapped to logical variables 
with the same name. Finally, a requires constraint is translated to an implica- 
tion. By applying these mappings to the inlined feature expression, one obtains 
the following formula. 

(( native A -> pure ) V ( pure A -i native )) A (( nosharing A -i sharing ) V 

( sharing A -> nosharing )) A ( sharedtext V text) A ( sharedtext — > sharing) 

Checking the consistency of the feature diagram now amounts to obtaining sat- 
isfiability for this logical sentence. To achieve this, the formula is transformed to 
a Binary Decision Diagram (BDD) [1]. BDDs are logical expressions ITE(y>, if, £) 
representing if-then-else constructs. Using standard techniques from modelcheck- 
ing any logical expression can be transformed into an expression consisting only 
of if-then-else constructs. If common subexpressions of this if-then-else expres- 
sion are shared we obtain a directed acyclic graph which can easily be embedded 
in the relational paradigm. The BDD for the aterm-java component is depicted 
in Fig. 3. 

Querying the SKB. Now that we have described how feature diagrams are 
transformed to a form suitable for storing in the SKB, we turn our attention 
to the next step: the querying of the SKB for checking feature selections and 
obtaining valid configurations. 
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The BDD graph consists of nodes la- 
beled by guards. Each node has two out- 
going edges, corresponding to the boolean 
value a particular node obtains for a cer- 
tain assignment. All paths from the root 
to T represent minimal assignments that 
satisfy the original formula. 

A selection of atomic features corre- 
sponds to a partial truth-assignment. This 
assignment maps for each selected feature 
the corresponding guard to 1 (true) . Let 
be the BDD derived from the feature dia- 
gram for which we want to check the con- 
sistency of the selection, then the meaning 
of a selection is defined as: {ai, ..., a n } K ► 

Uie{i,... n } I 0 */ 1 ] wli en a* G ip. Checking 
whether this assignment can be part of a 
valuation amounts to finding a path in the 
BDD from the root to T containing the edges corresponding to the assignment. 
If there is no such path, the enabled features are incorrect. If there is such a 
path, but some other features must be enabled too, the result is the set of pos- 
sible alternatives to extend the assignment to a valuation. The queries posted 
against the SKB use a special built-in query that generates all paths in a BDD. 
The resulting set of paths is then filtered according to the selection of features 
that has to be checked. The answer will be one of: 

— {gi, ...,p m }, ...}: a set of possible extensions of the selection, 
indicating an incomplete selection 

— {{}}: one empty extension, indicating a correct selection 

— {}: no possible extension, indicating incorrect selection 

If the set of features was correct, the SKB is queried to obtain the set of config- 
ured dependencies that follow from the feature selection. 

Take for example the selection of features {pure, sharedtext, visitors}. 
The associated assignment is [pure/1] [sharedtext /l}. There is one path to T in 
the BDD that contains this assignment, so there is a valuation for this selection 
of features. Furthermore, it implies that the selection is not complete: part of 
the path is the truth assignment of sharing, so it has to be added to the set 
of selected features. Finally, as a consequence of the feature selection, both the 
JJTraveler and SharedObjects component must be included in the configuration 
suite. 

6 Discussion 

Related Work. CDL is a domain specific language for expressing component 
level variability and dependencies. The language combines features previously 
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seen in isolation in other areas of research. These include: package based software 
development, module interconnection languages (MILs), and product configura- 
tion. 

First of all, the work reported here can be seen as a continuation of package 
based software development [2] . In package based software development software 
is componentized in packages which have explicit dependencies and configuration 
interfaces. These configuration interfaces declare lists of options that can be 
passed to the build processes of the component. Composition of components 
is achieved through source tree composition. There is no support for packages 
themselves being units of variation. A component description in CDL can be 
interpreted as a package definition in which the configuration interface is replaced 
by a feature description. The link between feature models and source packages is 
further explored in [11]. However, variability is described external to component 
descriptions, on the level of the composition. 

Secondly, CDL is a kind of module interconnection language (MIL). Although 
the management of variability has never been the center of attention in the 
context of MILs, CDL complies with two of the main concepts of MILs [7]: 

— The ability to perform static type-checking at an intermodule level of de- 
scription. 

— The ability to control different versions and families of a system. 

Static type-checking of CDL component compositions is achieved by model 
checking of FDL. Using dependencies and feature descriptions, CDL naturally 
allows control over different versions and families of a system. Variability in tra- 
ditional MILs boils down to letting more than one module implement the same 
module interface. So modules are the primary unit of variation. In addition, 
CDL descriptions express variability without committing beforehand to a unit 
of variation. 

We know of one other instance of applying BDDs to configuration problems. 
In [8] algorithms are presented to achieve interactive configuration. The con- 
figuration language consists of boolean sentences which have to be satisfied for 
configuration. The focus of the article is that customers can interactively config- 
ure products and get immediate feedback about their (valid or invalid) choices. 
Techniques from partial evaluation and binary decision diagrams are combined 
to obtain efficient configuration algorithms. 

Contribution. Our contribution is threefold. First, we have introduced variabil- 
ity at the component level to enable the product family approach in component- 
based product populations. We have characterized how component variability 
can be related to composition, and presented a formal language for the evalua- 
tion of this. 

Secondly, we have demonstrated how feature descriptions can be transformed 
to BDDs, thereby proving the feasibility of a suggestion mentioned in the future 
work of [12]. Using BDDs there is no need to generate the exponentially large 
configuration space to check the consistency of feature descriptions and to verify 
user requirements. 
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Finally we have indicated how BDDs can be stored in a relational SKB 
which was our starting point for automated software delivery and generation of 
configurations. 

The techniques presented in this paper have been implemented in a exper- 
imental relational expression evaluator, called Rscript. Experiments revealed 
that checking feature selections through relational queries is perhaps not the 
most efficient method. Nevertheless, the representation of feature descriptions is 
now seamlessly integrated with the representation of other software facts. 



Future Work. Future work will primarily be geared towards validating the ap- 
proach outlined in this paper. We will use the Asf+Sdf Meta-Environment [5] 
as a case-study. The Asf+Sdf Meta-Environment is a component-based envi- 
ronment to define syntax and semantics of (programming) languages. Although 
the Meta-Environment was originally targeted for the combination of Asf (Al- 
gebraic Specification Formalism) and Sdf (Syntax Definition Formalism), direc- 
tions are currently explored to parameterize the architecture in order to reuse 
the generic components (e.g., the user interface, parser generator, editor) for 
other specification formalisms [10]. Furthermore, the constituent components of 
the Meta-Environment are all released separately. Thus we could say that the 
development of the Meta-Environment is evolving from a component-based sys- 
tem towards a component-based product population. To manage the ensuing 
complexity of variability and dependency interaction we will use (a probably 
extended version of) CDL to describe each component and its variable depen- 
dencies. 

In addition to the validation of CDL in practice, we will investigate whether 
we could extend CDL to make it more expressive. For example, in this paper 
we have assumed that component dependencies should be fully configured by 
their clients. A component client refers to a variant of the required component. 
One can imagine that it might be valuable to let component clients inherit the 
variability of their dependencies. The communication between client component 
and dependent component thus becomes two-way: clients restrict the variability 
of their dependencies, which in turn add variability to their clients. Developers 
are free to determine which choices customers can make, and which are made 
for them. 

The fact that client components refer to variants of their dependencies in- 
duces a difference in binding time between user configuration and configuration 
during composition [3]. The difference could be made a parameter of CDL by 
tagging atomic features with a time attribute. Such a time attribute indicates the 
moment in the development and/or deployment process the feature is allowed to 
become active. Since all moments are ordered in a sequence, partial evaluation 
can be used to partially configure the configuration interfaces. Every step effects 
the binding of some variation points to variants, but may leave other features 
unbound. In this way one could, for example, discriminate features that should 
be bound by conditional compilation from features that are bound at activation 
time (e.g., via command-line options). 
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Abstract. Successful reusable software for large-scale distributed systems often 
must operate in multiple contexts, e.g., due to (1 ) integration with other systems 
using different technologies and platforms, (2) constant fine tuning needed to 
satisfy changing customer needs, and (3) evolving market conditions resulting 
from new laws and regulations. This situation causes vexing challenges for de- 
velopers of reusable software, who must manage the variation between these 
contexts without overcomplicating their solutions and exceeding project time 
and effort constraints. This paper provides three contributions to R&D efforts 
that address these challenges. First, it motivates the use of a concern-based ap- 
proach to enhance the level of abstraction at which component-based distrib- 
uted systems are developed and reused to (1) improve software quality and de- 
veloper productivity, and (2) localize variability aspects to simplify substitution 
of reusable component implementations. Second, we present our experience 
dealing with different domain- and middleware-specific variability gained ap- 
plying model-based component middleware software technologies to develop 
an Inventory Tracking System that manages the flow of goods in warehouses. 
Third, we present a concern-based research strategy aimed at effectively man- 
aging the variability caused by the presence of multiple middleware platforms 
and technologies. Our experience to date shows that using model-based soft- 
ware tools and component middleware as the core elements of software compo- 
sition and reuse - in conjunction with concern-based commonality and vari- 
ability analysis - helps reduce development complexity, improve system main- 
tainability and reuse, and increase developer productivity. 

Keywords: Commonality/Variability Analysis, Concern, Aspect, Model Driven 
Architecture, Component Middleware, CORBA Component Model (CCM). 



1 Introduction 



Emerging trends and challenges. Developing large-scale distributed systems is 
complex, time-consuming, and expensive. Ideally, distributed systems are developed 
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as product-line architectures [SAIP] that can be specialized in accordance with cus- 
tomer needs, hardware/software platform characteristics, and different quality of 
service (QoS) requirements. To minimize the impact of modifications, which is 
needed for specializing product-line architectures, architects and developers need to 
decouple the stable parts of their system from the variable parts [Coplien99], 

One source of variability in large-scale distributed systems is the heterogeneity of 
APIs supplied by different operating systems (e.g., threading and synchronization 
mechanisms in Windows and UNIX [C++NPvl]), GUI toolkits (e.g., MFC, Motif, 
GTK, and Qt), and middleware (e.g., CORBA, J2EE, and .NET). Moreover, each 
technology provides its own mechanisms to configure and fine tune the system to 
address customer’s needs. Another source of variability is different business rules 
(such as changing tax codes or air quality regulations) mandated by governments in 
different countries or regions. To maximize the benefits of software reuse, therefore, 
these sources of variability must be addressed via composition, encapsulation, and 
extension mechanisms that support alternative configurations and implementations of 
reusable functionality. 

Sources of variability in large-scale distributed systems have historically been man- 
aged by raising the level of abstraction used to develop, integrate, and validate soft- 
ware. For example, the heterogeneity (and accidental complexity) of assembly lan- 
guages in the 1960s and 1970s motivated the creation and adoption of standard third- 
generation programming languages (such as Ada, C, C++, and Java), which raised the 
abstraction level and helped improve the efficiency and quality of software develop- 
ment. Likewise, the complexity of developing large-scale systems from scratch on 
heterogeneous OS APIs motivated the creation and adoption of frameworks and pat- 
terns that provide application servers (such as CORBA, J2EE and .NET), which factor 
out reusable structures and behaviors in mature domains into standard reusable mid- 
dleware. 

Despite the advantages of refactoring commonly occurring capabilities into high-level 
reusable tools and services, challenges remain due to the diversity of alternatives for a 
given technology. For example, there are many different higher-level programming 
languages, frameworks, and middleware platforms that solve essential the same types 
of problems, yet are non-portable and non-interoperable. Ironically, many of these 
tools and services were positioned initially as integration technologies designed to 
encapsulate the heterogeneity of lower-level tools and services. This irony is caused 
by both the broader domain of systems that the new abstraction layer tries to cover, 
and the way in which the tools and services are implemented. Over time, however, the 
collection of integration technologies simply became another level of heterogeneity 
that needs to be encapsulated by the next generation of integration technologies. 



Solution approach: an Integrated Concern Modeling and Manipulation Environment. 
To address the challenges stemming from the heterogeneity of middleware platforms 
and to elevate the abstraction level associated with developing large-scale distributed 
systems using third-generation programming languages, we have been developing an 
Integrated Concern Modeling and Manipulation Environment (ICMME) that defines 
and manipulates fundamental concerns (such as remoting, component lifecycle man- 
agement, communication and processor resource usage, and persistency) that repre- 
sent higher level system building blocks than components or classes in object-oriented 
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approach. Our ICMME combines key techniques and tools from Model Driven Ar- 
chitecture (MDA) [MDA], Aspect-Oriented Software Development (AOSD) [AOSD], 
and component middleware paradigms to provide a higher-level environment for de- 
veloping large-scale distributed systems. 

Experience we gained from developing frameworks [C++NPvl,C++NPv2] and mid- 
dleware platforms [TAOl, CIAOl] enabled us to identify and document core patterns 
[POSA2] for managing different types of middleware variability. To evaluate the ex- 
tent to which ICMME technologies help to address variabilities at different levels of 
abstractions (i.e., from the variability of configuring optional settings of a particular 
middleware platform up to the variability of handling different middleware plat- 
forms), we have developed a prototypical Inventory Tracking System (ITS). This pa- 
per uses our ITS prototype to illustrate the reuse benefits of ICMME-based integra- 
tion by focusing on a fundamental concern - remoting[dl] - and then (1) developing 
an MDA model of a component-based remoting infrastructure according to the pat- 
terns described in [Voelter], (2) visually mapping ITS components to the roles defined 
by these patterns, and (3) localizing the impact of variability, caused by a need to 
support different middleware, by creating a domain-specific code generator to pro- 
duce code for a real-time CORBA Component Model (CCM) [CCM] implementation 
called The Component Integrated ACE ORB (CIAO). [d2]Creating generators for 
J2EE and .NET component middleware remains a future work. 

Paper organization. The remainder of this paper is organized as following: Section 2 
describes the structure and functionality of our ITS prototype; Section 3 discusses the 
lessons learned thus far from applying our ICMME approach to the ITS case study; 
Section 4 compares our ICMME approach with related work; and Section 5 presents 
concluding remarks. 



2 Overview of the ITS Case Study 

An Inventory Tracking System (ITS) is a warehouse management system that moni- 
tors and controls the flow of goods and assets. Users of an ITS include couriers, such 
as UPS, FedEx, DHL, as well as airport baggage handling systems. A key goal of an 
ITS is to provide convenient mechanisms that manage the movement and flow of in- 
ventory in a timely and reliable manner. For instance, an ITS should enable human 
operators to configure warehouse storage organization criteria, maintain the set of 
goods known throughout a highly distributed system (which may span organizational 
and even international boundaries), and track warehouse assets using GUI-based op- 
erator monitoring consoles. This section presents an overview of the behavior and 
architecture of our ITS prototype and describes how we have integrated MDA tools 
with component middleware to enhance productivity and quality. 



2.1 ITS System Behavior 

Figure 1 shows a UML use case diagram for our ITS prototype. As shown in the fig- 
ure, there are three primary actors in the ITS system. 
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Fig. 1 . Use Case Diagram for the ITS Prototype 

For the Configurator actor, the ITS provides the ability to configure the set of avail- 
able facilities in certain warehouses, such as the structure of transportation belts, 
routes used to deliver goods, and characteristics of storage facilities (e.g., whether 
hazardous goods are allowed to be stored, maximum allowed total weight of stored 
goods, etc.). For the Operator actor, the ITS provides the ability to reorganize the 
warehouse to fit future changes, as well as dealing with other use cases, such as re- 
ceiving goods, storing goods, fetching goods, dumping goods, stock queries, specify- 
ing delivery time accuracy, and updating operator console views. For the Operating 
Environment actor, the ITS provides the ability to tolerate partial failures due to 
transportation facility problems, such as broken belts. To handle these partial failures 
the ITS dynamically recalculates the delivery possibilities based on available trans- 
portation resources and delivery time requirements. 



2.2 ITS Architecture 

The ITS architecture is based on component middleware developed in accordance 
with the OMG’s CORBA Component Model (CCM) [CCM]. A component is a basic 
meta-type in CCM that consists of a named collection of features - known as ports, 
i.e., event sources/sinks, facets, and receptacles - that can be associated with a single 
well-defined set of behaviors. In particular, a CCM component provides one or more 
ports that can be connected together with ports exported by other components. CCM 
also supports the hierarchical encapsulation of components into component assem- 
blies, which export ports that allow fine tuning of business logic modeling. 

Figure 2 illustrates the key components that form the basic implementation and inte- 
gration units of our ITS prototype. Some ITS components (such as the Operator Con- 
sole component) expose interfaces to end users, i.e., ITS operators. Other compo- 
nents represent warehouse hardware entities (such as cranes, forklifts, and shelves) 
and expose interfaces to manage databases (such as Transportation Facility compo- 
nent and the Storage Facility component). Yet another set of components (such as the 
Workflow Manager and Storage Manager components) coordinate and control the 
event flow within the ITS system. 
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Fig. 2. Key CCM ITS Architecture Components 

As illustrated in Figure 2, the ITS architecture consists of the following three subsys- 
tems: 

1. Warehouse Management (WM) subsystem, which is a set of high-level func- 
tionality and decision making components. This level of abstraction calculates the 
destination location and delegates the rest of the details to the Material Flow Con- 
trol (MFC) subsystem. In particular, the WM does not provide capabilities such as 
route calculation for transportation or reservation of intermediate storage units. 

2. Material Flow Control (MFC) subsystem, which is responsible for executing 
high-level decisions calculated by the WM subsystem. The primary task of the 
MFC is to deliver goods to the destination location. This subsystem handles all 
related details, such as route (re)calculation, transportation facility reservation, and 
intermediate storage reservation. 

3. Warehouse Hardware (WH) subsystem, which is responsible for dealing with 
physical devices, such as sensors and transportation units (e.g., belts, forklifts, 
cranes, pallet jacks, etc.). 

The functionality of these three ITS subsystems is monitored and controlled via an 
Operator Console. All persistence aspects are handled via databases that can be man- 
aged either by the centralized DBMS or distributed DBMS over different DB servers. 
A typical interaction scenario between these three subsystems involves (1) a new 
good arriving at the warehouse entrance and being entered into the ITS either auto- 
matically or manually, (2) the WM subsystem calculating the final destination for 
storing the good by querying the Storage Facility for a list of available free locations 
and passing final destination to the MFC subsystem, (3) the MFC subsystem calcu- 
lating the transportation route and assigns required transportations facilities, and (4) 
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the MFC subsystem interacting with the WH subsystem to control the transportation 
process and if necessary adapt to changes, such as failures or the appearance of higher 
priority tasks. 



2.3 Applying Component Middleware and MDA Tools to ITS 

To evaluate how component middleware technologies can help improve productivity 
by enabling developers to work at a higher abstraction level than objects and func- 
tions, we selected the Component Integrated ACE ORB (CIAO) [ClAOl, CIA02] as 
the run-time platform for our ITS prototype. CIAO is QoS-enabled CCM middleware 
built atop the The ACE ORB (TAO) [TAOl, TA02]. TAO is a highly configurable, 
open-source Real-time CORBA Object Request Broker (ORB) that implements key 
patterns [POSA2] to meet the demanding QoS requirements of distributed real-time 
and embedded (DRE) systems. 

CIAO extends TAO to provide the component-oriented paradigm to developers of 
DRE systems by abstracting critical systemic aspects (such as QoS requirements and 
real-time policies) as installable/configurable units supported by the CIAO component 
framework. Promoting these DRE aspects as first-class metadata disentangles (1) 
code for controlling these non-functional aspects from (2) code that implements the 
application logic, thereby making DRE system development more flexible and pro- 
ductive as a result. CIAO and TAO can be downloaded from 
deuce.doc.wustl.edu/Download.html. 

To evaluate how MDA technologies can help improve productivity by enabling de- 
velopers to work at a higher abstraction level than components and classes, we devel- 
oped and applied a set of modeling tools to automate the following two aspects of ITS 
development: 

1. Warehouse modeling, which simplifies the warehouse configuration aspect of the 
ITS system according to the equipment available in certain warehouses, including 
moving conveyor belts and various types of cranes. These modeling tools can 
synthesize the ITS database configuration and population. 

2. Modeling and synthesizing the deployment and configuration (D&C) [D&C] as- 
pects of the components that implement the ITS functionality. These modeling 
tools use MDA technology in conjunction with the CCM to develop, assemble, and 
deploy ITS software components. 

These two aspects are relatively orthogonal to each other in terms of aspect separa- 
tion, i.e., they depict the overall system from different perspectives, yet they are com- 
plementary to each other. For example, Figure 3 shows how the system modeler and 
warehouse modeler play different roles in the ITS development process. 

The system modeler studies the business logic of general ITS and produces a model 
describing the software aspect of the system, including CCM component, deploy- 
ment/assembly specification, and QoS requirements. The warehouse modeler, in con- 
trast, is responsible for modeling one or a group of specific warehouses. 
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Fig. 3. ITS Modeling Aspects 



The warehouse and component model aspects can be implemented separately during 
system development, i.e., the warehouse model can be mapped to the CCM and D&C 
model by means of MDA-based code generation to fully materialize an ITS system. 
There exist, however, some concerns that span these two aspects. For example, the 
number of components and the way they communicate with each other can influence 
the configuration of different infrastructural aspects, such as real-time event channels 
[Harrison], In ITS, however, a warehouse modeler often needs to fine tune the con- 
figuration on the basis of the warehouse model. In these cases, different actions are 
applied according to the nature of the concern after necessary analysis. For example, 
the remoting concern may involve determining the mode of communication, such as 
asynchronous publisher- subscriber or synchronous request-response. 



3 Towards an Integrated Concern Modeling and Manipulation 
Environment 



Section 2.3 introduced a set of middleware components and modeling tools that 
helped increase the productivity and quality of our ITS development process. Based 
on our experience with the ITS prototype, we contend that to make software for dis- 
tributed systems more reusable and resilient to future changes, we need to model and 
manipulate concerns separately. This section describes an Integrated Concern Mod- 
eling and Manipulation Environment (ICMME) that we are developing to achieve this 
vision. 



3.1 Research Foundations 



Based on our experience in developing large-scale distributed systems over the past 
two decades [POSA2, C++NPvl, C++NPv2], we have observed several problems that 
underlie the challenges outlined in Section 1. Below, we outline two of these prob- 
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lems and briefly describe how emerging technologies like MDA and AOSD [Ho02] 
could be exploited to address them. 

3.1.1 Low-level abstractions and tools. Despite improvements in third-generation 
programming languages (such as Java or C++) and run-time platforms (such as com- 
ponent middleware), the level of abstraction at which business logic is integrated with 
the set of rules and behavior dictated by component models is still too low. For ex- 
ample, different component models provide different set of API and rules for compo- 
nent lifecycle management, example, e.g., there are multiple lifecycle management 
mechanisms available in CCM. As a result, this set of rules typically affects the im- 
plementation of business logic intrusively, i.e., the business logic developer implicitly 
assumes certain behavior from the component container and must adapt business logic 
accordingly. In addition, the level of abstraction and composition supported by third- 
generation languages does not intuitively reflect the concepts used by today’s soft- 
ware developers [Mezini02], who are using higher level concerns (such as persis- 
tence, remoting, and synchronization) to express their system architectures. 

A promising way to alleviate these problems with low-level abstractions and tools is 
to apply Model Driven Architecture (MDA) techniques [MDA] that express applica- 
tion functional and non-functional requirements at higher levels of abstraction beyond 
third-generation programming languages and conventional component middleware. 
At the core of the MDA is a two-level hierarchy of models: 

• Platform-independent models (PIMs) that describe at a high-level how applica- 
tions will be structured and integrated, without concern for the target middle- 
ware/OS platforms or programming languages on which they will be deployed. 
PIMs provide a formal definition of an application's functionality implemented 
on some form of a virtual architecture. For example, the PIM for the ITS could 
assume that there are two services available for each component: (1) a remote in- 
vocation service, i.e., an object request broker (ORB) and (2) an information 
storage and retrieval service, i.e. a database. At this stage it does not really matter 
whether these services are CORBA or SOAP and whether they use relational or 
object database, respectively. 

• Platform-specific models (PSMs) that are constrained formal models that ex- 
press platform-specific details. The PIM models are mapped into PSMs via 
translators. For example, the ITS uses the set of patterns and roles to describe 
the component collaboration infrastructure suggested by the OMG Component 
Collaboration Architecture [EDOC] that is specified in the PIM and could be 
mapped and refined to a specific type in the underlying platform, such as a QoS- 
enabled implementation of the CORBA Component Model (CCM) 
[C1A01,CIA02], 

MDA tools use PIMs and PSMs to improve the understanding of software-intensive 
systems using higher-level models that (1) standardize the process of capturing busi- 
ness logic and quality of service (QoS)-related requirements and (2) ensure the con- 
sistency of software implementations with analysis information associated with func- 
tional and systemic QoS requirements captured by models. A key role in reducing 
software complexity via MDA tools is played by meta-modeling [GME], which de- 
fines a semantic type system that precisely reflects the subject of modeling and ex- 
poses important constraints associated with specific application domains. 
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3.1.2 Tangled concerns. Different concerns, such as component lifecycle manage- 
ment, resource usage, persistence, distribution, and safe/efficient cache and memory 
management, are often tangled within software source code, which impedes effective 
modularity [Kiczales]. If these concerns could also vary (either completely or par- 
tially) then the corresponding incurred variability is also tangled with other concerns 
in source code and probably crosscut the whole system. Variability in commonly tan- 
gled concerns depends on many factors, such as deployment strategy and run-time 
conditions. For example, the communication latency between frequently communi- 
cating components depends on their distribution and deployment. Today, many appli- 
cations are custom programmed manually to implement and compose these “cross- 
cutting” concerns, which is a tedious, error-prone, and non-scalable process. There are 
also limitations with third-generation programming languages that encourage the 
“tyranny of the dominant decomposition” [Tarr99], which involves the inability to 
apply different decomposition strategies simultaneously. Languages that support mul- 
tiple inheritance address this problem to the limited extent, whereas languages with- 
out multiple inheritance make this task very hard. 

A promising way to alleviate problems caused by tangled concerns is to apply Aspect- 
Oriented Software Development (AOSD) [AOSD] techniques. AOSD techniques go 
beyond object-oriented techniques to enable the design and implementation of sepa- 
rate cross-cutting concerns, which can then be woven together to compose complete 
applications and larger systems. In the absence of AOSD techniques, many concerns 
are tangled with the rest of the application code, thereby increasing software com- 
plexity. A key role in reducing complexity via AOSD techniques is modularization 
and separate handling of crosscutting concerns and generation of final application by 
means of aspect weaving tools [Grayl, Gray2], 



3.2 Types of Changes Caused by Variability 



To understand what types of concern manipulation functionality should be supported 
by an ICMME, during the design and implementation stages of the ITS project we 
systematically captured the terminology we used to express what types of changes 
were made, as well as their consequences (i.e., affected interfaces, classes, and com- 
ponents). Moreover, based on our daily project experiences and by observing how 
typical change requests occurred, we observed that change requests were often for- 
mulated in terms of features of the system, such as adding, removing, or updating a 
certain capabilities. Only in trivial cases was just one particular class affected. Gener- 
alizing these observations and combining them with other experiences we have had 
developing ACE, TAO, and CIAO, it appears that most practical software systems 
have groups of logically connected classes and components that together implement 
the functionality of a particular concern. This implementation could be either local- 
ized at a certain place or crosscut several software artifacts. As a result, we conclude 
that changes could be categorized into the following types: 

• Local changes, which are typically caused by errors and do not lead to changes in 
relationships between the core system classes/components and also do not change 
roles played by each particular class. A common example of local changes is re- 
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factoring measures [Refactoring], which are caused either by errors or by the 
need to organize the code better. Typically, such measures do not lead to role 
changes, which is why they are treated as “local.” 

• Structural changes, which sometimes occur due to (1) the need to add, remove or 
update some functionality, which is implemented (or supposed to be imple- 
mented) by several components/classes, each playing a certain role in collabora- 
tion to achieve the goal of supporting required functionality and (2) serious bugs 
which in turn lead to the redesign of the implementation structure of some func- 
tionality. For an example of structural changes, consider a new requirement to 
support dynamic (re)configuration of components that were statically compiled 
previously. This requirement can be addressed by the Component Configurator 
[POSA2] pattern, which allows an application to link and unlink its component 
implementations at run-time without having to modify, recompile, or relink the 
application statically. This change will require at least one class must derive from 
the base Component class and at least one class will be changed or introduced to 
play the Component Repository role described in the Component Configurator 
pattern. In such a situation, therefore, a set of classes must be changed together to 
handle the new requirement. 

In this paper we concentrate on the structural changes because they usually affect sev- 
eral places within a software (sub)system, which indicates the existence of higher 
level relationships within the system. These relationships are usually manipulated as a 
whole, i.e., added/removed completely. As demonstrated in the patterns literature 
[POSA1, POSA2, GoF], it is possible to identify such stable relationships in the form 
of pattern languages. 

There are also aspects and relationships that cannot be modularized using 00 tech- 
niques [AOP]. For example, remoting, resource management, and transaction han- 
dling are concerns that are seldom modularized with OO techniques. If developers are 
committed to implementing a certain pattern, is they rarely implement them partially, 
e.g., the Observer [GoF] pattern combines both the observer and the subject. Even if 
some roles in a pattern are absent, their presence is assumed implicitly during imple- 
mentation and will likely be provided later by some other developer or tool. These 
observations confirm the approach advocated by AOSD community about software 
systems as a set of (sometimes cross-cutting) concerns, which can be encapsulated 
using OO techniques (e.g., patterns) in some cases and in other cases new approaches 
are required (e.g., MDA and AOSD). 



3.3 Raising the Abstraction Level 

Based on the observation about type of changes and their typical impact on imple- 
mentation presented in Section 3.2, we suggest using concerns as building blocks for 
an ICMME. A concern is a specific requirement or consideration that must be ad- 
dressed to satisfy the overall system goal [RLaddad], Each concern defines roles that 
could be played by that part of the system that implement this concern. A key issue 
that must be resolved to use concerns effectively involves the relationship between 
concerns and the underlying business logic. 
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It is possible to think about concerns as interfaces in 00 sense. In this case, the proc- 
ess of assigning components to certain roles defined by concerns could be treated as 
implementation of the concern. This approach leads to the interesting analogy be- 
tween interface implementation (by means of inheritance, delegation, or any other 
technique) in 00 sense and the same relationships between concerns and “base” code 
implementing a certain concern. For example, if we consider the ability to demulti- 
plex callback events efficiently using the Asynchronous Completion Token (ACT) 
[P0SA2] as the concern - and there is a set of classes implementing ACT - then we 
can (roughly) say that this set of classes “is an” efficient callback demultiplexer and 
they can be used wherever ACT functionality is expected without visible difference to 
the ACT users who rely on ACT functionality. According to the Liskov Substitution 
Principle (LSP) [LSP], this relationship between abstract description of efficient de- 
multiplexing concern (encapsulated using ACT design pattern) and set of classes im- 
plementing ACT feature is inheritance. 

By defining such fundamental relationships between concerns presented in the form 
of design patterns or any other role-based definition of some functionality and imple- 
mentation of this functionality as a role mapping to available business logic, we can 
provide a powerful mechanism to encapsulate the variability at a higher abstraction 
level than that provided by conventional third-generation programming languages - in 
particular, we can encapsulate the impact of middleware platform variability on the 
rest of the system. The primary advantage of this approach is the ability to systemati- 
cally introduce changes to the system using roles, defined by concerns. For example, 
if developers want to add a Visitor pattern [GoF] implementation to the code, a wiz- 
ard support by the modeling tools could guide the user through the role mapping pro- 
cess to make sure that all roles defined by the Visitor pattern are mapped by the de- 
velopers to the appropriate implementation classes. 

Figure 4 provides the high level view on the complete ICMME modeling process. 
This figure shows how domain-specific models are used as input for modeling appli- 
cation-specific business logic and either selecting existing or creating new reusable 
concern models. After completing the role mapping process, the platform- specific 
model will be generated, followed by the assembly of the complete executable appli- 
cation. 

The OMG EDOC standard [EDOC-Patterns] addresses the need for role-based de- 
scription of higher level functionality by standardizing a set of modeling and meta- 
modeling capabilities to ensure consistency and interoperability between different 
modeling tools. In addition, OMG’s MDA approach standardizes mappings from plat- 
form-independent system models (PIMs) to popular platform-specific middleware 
(PSMs), such as CORBA, EJB and WebServices. 



3.4 Handling Middleware Platform Variability Via Concern-Based ITS Design 

To provide a concrete illustration of the idea of concern-based variability localization 
within our ICMME, we now describe how we are supporting (1) platform- independ- 
ent definitions of the remoting concern of ITS, which defines the mechanisms for 
passing messages between components instantiated in different processes 
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Fig. 4. ICMME Concern-based Modeling Process 

and possibly running on different hosts and (2) platform-specific mappings of the 
remoting concern to various component middleware platforms, such as CCM or EJB. 
After completing the high-level domain-specific modeling steps described in Sections 
2.2 and 2.3, we next specify the set of components corresponding to the domain- 
specific model elements, as well as the way these components communicate with each 
other. This specification process can be guided interactively by a model-based tool, 
such as the wizards used to configure various types of tools on Windows platforms. 

For example, to specify and manipulate the remoting concern the modeling tool can 
guide the developer through the three steps shown in Figure 5 and described below: 

1. As shown in in Figure 5, step 1 involves choosing the fundamental communica- 
tion paradigm, such as Asynchronous Message Model or RPC Communication 
Model using the Broker pattern [POSA1 ]. Selecting the communication paradigm 
provides a more detailed specification of the roles needed to support a particular 
communication type. 

2. The refinement process is shown as step 1 in Figure 5, which uses an interactive 
modeling tool to refine the model based on the set of available patterns for client 
and server implementations of Broker-based distributed systems [Voelter, 
Kircher]. In this step, a more fine-grained model of the roles played by compo- 
nents in the broker-based distributed system is presented. This refinement process 
could be performed by platform-specific roles to support various component 
middleware implementation platforms, such as CORBA CCM or Sun’s EJB. 

3. Step 3 of Figure 5 show how the interactive modeling tool can be used to allow 
developers to deploy and configure each component according to the roles de- 
fined by concerns. In this step, developers can map the high-level architecture 
blocks presented in Figure 1 to the corresponding roles according to the selected 
remoting paradigm, which is formalized in form of patterns according to the 
EDOC specification [EDOC-Patterns]. Mapping is the process of defining which 
part of a component plays the role(s) expected by certain elements of a concern. 
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One way to perform this mapping is to apply the on-demand remodularization 
technique described in [Mezini03]. 

The three steps presented above in conjunction with the ICMME provide the follow- 
ing improvements in handling variability aspects: 

• Using patterns as an abstract base for a family of different implementations for 
certain aspects of a distribute system provides a variability encapsulation mecha- 
nism similar to inheritance in object-oriented programming language, but at a 
higher level of abstraction. As a result, reusability can be achieved at large scale 
and larger parts of distributed applications are shielded from changes caused by 
variations. 

• Pattern-based modeling of different aspects formally describes what is expected 
and provided by certain components and subsystems, thereby forming a solid 
foundation for formally defining controllable and verifiable substitution of im- 
plementation parts. Having such formalized descriptions helps reduce accidental 
complexities caused by inconsistent combination of semantically inconsistent 
variable parts. 

• Support for model refinements using iterative model-based tools helps to simplify 
the process of finding variation points for complex cases where it is hard for sys- 
tem architects and developers to foresee all variability aspects in advance. 

• Code generation using multiple fine-granularity model interpreters explicitly 
tuned for certain modeling functionality addresses the problem of overly complex 
code generators, which could otherwise obviate the benefits of the higher-level 
modeling techniques described in this paper. 



4 Related Work 

This section reviews work related to our integrated concern and model manipulation 
environment (ICMME) and describes how modeling, domain analysis, and generative 
programming techniques are being used to model and provision component-based 
distributed systems to handle variability more effectively than conventional software 
development techniques. 

Our work on ICMME extends earlier work on Model-Integrated Computing (MIC) 
[Janos:97, HarelGery:96, Lin:99, Gray:01] that focused on modeling and synthesizing 
embedded software. Examples of MIC technology used today include GME 
[GME:01] and Ptolemy [Lee:94] (used primarily in the real-time and embedded do- 
main) and Model Driven Architecture (MDA) [MDA] based on UML [UML:01] and 
XML [XML:00] (which have been used primarily in the business domain). Our work 
on ICMME combines the GME tool and UML modeling language to model and syn- 
thesize component middleware used to configure and deploy distributed applications. 
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Fig. 5. Remoting Aspect Encapsulation using Mapping Technique 



Generative programming (GP) [Czarnecki] is a type of program transformation con- 
cerned with designing and implementing software modules that can be combined to 
generate specialized and highly optimized systems fulfilling specific application re- 
quirements. The goals of GP are to (1) decrease the conceptual gap between program 
code and domain concepts (known as achieving high intentionality), (2) achieve high 
reusability and adaptability, (3) simplify managing many variants of a component, 
and (4) increase efficiency (both in space and execution time). Our ICMME approach 
uses GP to map models to the C++ code during the code-generation phase. Our code 
generator for CIAO CCM container produces code, which utilizes many techniques 
associated with GP. Using pure GP, however, can be labor intensive, tedious, and 
error-prone to compose consistent set of components together. To avoid these prob- 
lems, our ICMME code generation approach complements GP technique by assuring 
that only consistent set of components will be composed by introducing high-level 
model constraints that will be processed accordingly by model interpreter during code 
generation and model validation phases. 

Aspect-oriented software development (AOSD) is a GP technology designed to more 
explicitly separate concerns in software development. AOSD techniques make it pos- 
sible to modularize crosscutting aspects of complex distributed systems. An aspect is 
a piece of code or any higher level construct, such as implementation artifacts cap- 
tured in a MDA PSM, that describes a recurring property of a program that crosscuts 
the software application, i.e., aspects capture crosscutting concerns. 
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Scope, Commonality, and Variability (SCV) analysis [Coplien99] is related work on 
domain engineering that focuses on identifying common and variable properties of an 
application domain. SCV uses this information to guide decisions about where and 
how to address possible variability and where the more “static” implementation 
strategies could be used. Our ICMME approach complements CSV at the phase where 
the step from abstract definition of commonality and variability aspects should be 
transformed to the model, which is concrete enough for the code generation. In the 
nomenclature of SVC, PIMs represent the common aspects of distributed systems, 
whereas PSMs implement the variability aspects. SCV defines the basic principles 
and procedure to capture commonality and variability. We enhance this approach by 
formalizing the models up to the level where captured commonality and variability 
could be processed by model interpreter to produce working C++ or Java source code. 

5 Concluding Remarks 

Advances in hardware and software technologies are raising the level of abstraction at 
which distributed systems are developed. With each increase in abstraction comes a 
new set of complexities and variabilities that must be mastered to reap the rewards of 
the higher-level technologies. A key challenge associated with higher-level software 
abstractions is that the integration complexity makes it hard to assure the overall 
quality of the complete system. To explore the benefits of applying an Integrated 
Concern Modeling and Manipulation Environment (ICMME) and component mid- 
dleware technologies to address these challenges, we have developed an Inventory 
Tracking System (ITS) prototype, which is a distributed system that employs MDA 
tools and component middleware to address key requirements from the domain of 
warehouse management. 

Our experience gained from applying ICCME to our ITS prototype can be summa- 
rized as follows: 

• Even for mid-size distributed systems (e.g., consisting of around 20 to 50 archi- 
tectural components), the complexity reduction stemming from model-driven 
code generation can be obviated by an increase in model interpreter complexity 
caused by overly general models and interpreters. To address this problem, the 
code generation process can be split into several intermediate steps - such as 
platform-independent models (PIMs) to platform-specific models (PSMs) to 
source code transformation - to improve reusability of the model interpreter it- 
self. This structure helps to simplify model interpreters since each interpreter 
containing less functionality. It is therefore possible to substitute only certain 
modules of the interpreter to serve different application needs, thereby achieving 
better reusability at the model interpreter level[d5]. 

• Complexities related to the existence of variable parts within the software sys- 
tems must be addressed in systematical way via formalized descriptions of inte- 
gration points and mechanisms for variable parts of the ITS architecture. Role- 
based abstract definition of such points can be used for these purposes. For cases 
where best practices are documented in the form of patterns, it is beneficial to use 
patterns as role-based platform-independent formalization mechanism within 
models. Patterns can play the same role in distributed system architectures that 
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abstract classes play in object-oriented design, where they provide a common in- 
terface for a family of related implementations. Having such an abstract, pattern- 
based “interface” - with many possible mappings to implementation artifacts - 
helps to localize the variable aspects and shields other system components from 
changes resulting from providing new implementations for certain variable as- 
pects. 

In future work, we plan to enhance our ICMME by supporting open standards, such 
as the OMG Meta-Object Facility (MOF) [MOF] and Enterprise Distributed Object 
Computing (EDOC) [EDOC] specifications to support a wide range of usage patterns 
and distributed applications. These efforts will create and validate a broader range of 
modeling tools that cover key modeling- and middleware-related aspects. 
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Abstract. Designing systems of asynchronous web services is challeng- 
ing. Addressing the design in terms of component reuse helps address 
important questions that need to be answered if dynamic configuration 
of business solutions from web services is to be achieved. The fact that 
the components are web services doesn’t mean that all the problems of 
reuse have been solved. An architecture for dealing with reuse and dy- 
namic reconfiguration, based on stateless services and stateful messages, 
is investigated. A notation for describing the flow of documents in such 
a system is introduced. This is shown to be effective at describing the 
behaviour of components, a necessary part of designing reusable com- 
ponents, especially those that participate in long-running, asynchronous 
interactions. 



1 Introduction 

Global systems built to support long running interactions have particular re- 
quirements when it comes to reuse of components. The contemporary view of 
how such systems should be built is to deploy web services and to engineer 
business processes to coordinate interactions among these services [3] . Long run- 
ning interactions are necessarily asynchronous [8] and asynchronous interactions 
among components have new challenges for component design, especially in the 
context of reuse [5], [7], [9]. 

When interactions run for long periods (days, weeks) and when many business 
sessions are interleaved, it is never going to be convenient to stop the entire 
system in order to replace a component. Components therefore need to be hot- 
swapped, without disrupting the interactions in which the retiring component 
is involved, allowing the new component to continue with that sequence while 
providing some improved behaviour [6] . Web Services is the technology of choice 
for building such dynamic systems. 

The plug-and-play requirement is an extreme form of reuse that requires con- 
siderations that web service architectures go a long way towards meeting. How- 
ever, designing systems for asynchronous interaction is challenging. Addressing 
the design in terms of component reuse, as we propose here, has many advan- 
tages. In particular, considering how a web service is to be reused in dynamic 
plug-and-play scenarios, leads to a simpler design, we believe. In this paper we 
introduce a design notation, Document Flow Model, which helps to make such 
designs. 



J. Bosch and C. Krueger (Eds.): ICSR 2004, LNCS 3107, pp. 185—194, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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2 Web Services 

Let us distinguish between what we shall call low-level web services and high-level 
web services. We will use the term low-level web services when referring to the 
basic technologies provided by web servers enabled to support SOAP interactions 
and WSDL defined interfaces [1], [2]. These are the basic technologies available 
in Java Web Services and in .NET. The web service is published on a host, along 
with its WSDL interface description, in such a way that a subscriber can use 
the WSDL description to construct a stub for use in accessing the service. The 
subscriber might be using a quite different platform and technology than the 
web service publisher. 

We will use the term high-level web services when referring to the way in 
which low-level web services are orchestrated to deliver business processes [3], 
[4] . Here we address the domain of high-level web services and discuss the types 
of architecture and of business solution that high-level web services engender. 



ServiceUser ServiceProvider 




ServiceProvider 



Fig. 1. A basic client-server web 
service architecture. The arrow 
points from client to server. 



Fig. 2. An orchestrating web 
service, the Agent, makes use of 
two other web services. 



Figure 1 shows the basic client-server structure of an elementary web service 
solution. The ServiceUser invokes the web service on the ServiceProvider and 
anticipates a result. The arrow in the diagram points from client to server. 
Requests will travel in the direction of the arrow and replies (if any) will travel 
in the opposite direction. 

Figure 2 shows a slightly less trivial system. Here we have an orchestrating 
web service (called Agent here) that makes use of two other web services, coor- 
dinating or orchestrating their combined service. For example, the ServiceUser 
may make a query. That query goes to Agent which, let us say, makes enquiries 
of the two ServiceProvider’s that it is attached to and combines their replies in 
some way before returning that reply to the ServiceUser. 

Of course, it is still possible for a ServiceUser to interact directly with either, 
or both, of the Ser viceProviders. In general, many ServiceUsers will interact con- 
currently. This means that service providers (and in this sense the Agent is also 
a service provider) must expect to process these interactions in an interleaved 
fashion. This would be true if the interactions were synchronous or asynchronous, 
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but for the reasons given earlier (that we are concerned primarily with very long 
running interactions) we will concentrate here on asynchronous interactions. 

Recall that the arrows in these diagrams point from client to server. The 
Agent is a web service acting as a server to the ServiceUser and a client to the 
ServiceProviders. When many ServiceUsers are active each Agent and each Ser- 
viceProvider will see the messages comprising the interactions in an arbitrarily 
interleaved order. The reuse issues here are numerous. A new type of Agent or 
ServiceUser could be deployed and would need to interact with Agents and Servi- 
ceProviders that were in the middle of their interactions with others. Moreover, 
a new ServiceProvider might be required to replace an existing ServiceProvider 
and have to be in a position to complete any of the interactions that the original 
ServiceProvider had started but not yet completed. 

Figure 3 shows how a single ServiceProvider from Figure 2 can be replaced 
by a coordinated network of (in this case) two other ServiceProviders. The as- 
sumption here is that the interfaces that the Agent implements are such that 
to a ServiceUser the Agent looks just like a ServiceProvider and to a Service- 
Provider the Agent looks just like a ServiceUser. Whilst such a neat arrangement 
of the interfaces seems to solve some of the reusability issues (for example, we 
know what type of interface plugs in where) it exacerbates other problems. In 
particular, when it comes to the behaviour of interactions across an interface, 
the ability to unplug one component and plug in an alternative is non-trivial, as 
we shall show. 




ServiceProvider 



Fig. 3. One ServiceProvider can be re- 
placed by a coordinated network of others 



ServiceProvider Resource 



Fig. 4. Making a Web Ser- 
vice stateless by separating 
off the state into a, proba- 
bly persistent, Resource [4]. 
Shaded boxes are stateful in 
these diagrams. 



One aspect of our ability to simply unplug something in the middle of a long- 
running interaction and plug in an alternative, is whether or not the component 
has state. As in [4], [10], we will distinguish between components that have state 
and interactions that have state. In the diagrams so far we have used shading 
to indicate components that have state. We have assumed ServiceProviders are 
stateful and Agents and ServiceUsers are stateless. This is an arbitrary choice, 
to illustrate a point. 

Replacing a stateful component with another is always going to be more 
difficult than replacing a stateless component with another. This is one of the 
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reasons that one of the principal design criteria for Web Services is that they 
should be stateless [11]. 

There are two distinct mechanisms for making Web Services stateless. The 
first is to separate the state into an independent (probably persistent) com- 
ponent, such as a database, leaving the functionality in a replaceable stateless 
component (see Figure 4). The second is to put the state in the interaction (using 
cookies, session objects etc), which is considerably more powerful than one might 
imagine, although it is not sufficient on its own. The next section addresses these 
issues explicitly. 

A Web Service is in reality deployed in a container (Figure 5). One of the 
functions of the container is to decide whether a request is handled by an existing 
instance of a component or by a new instance. Another (orthogonal) decision is 
whether the container generates a new thread for each request. The assumption 




Container 



Fig. 5. The container in which 
web services are deployed deter- 
mines properties of components. 
Usually the container is respon- 
sible for demultiplexing mes- 
sages and forwarding them, con- 
currently, to specific instances of 
components. 




ServiceProvider 



Fig. 6. Reusable Web Services 
with stateless components and 
stateful interactions 



we make is that components in the diagrams are instances and that the con- 
tainer will indeed use a new thread for each request. This means that requests 
can in fact overtake each other, which significantly complicates reasoning about 
asynchronous behaviour. 

Figure 6 shows the cumulative consequence of the assumptions we have out- 
lined in this section. We show three (now stateless) ServiceProviders around 
a shared stateful Resource. The ServiceUsers are engaged in interactions (con- 
versations) with these services. These interactions are probably stateful, in the 
sense that the requests and replies carry in them contextual information about 
the state of the interaction (such as the contents of a basket). When two Ser- 
viceUsers make use of the same service, their interactions will be interleaved. 
The use of context in the interactions means that, in a long-running interaction, 
part of it may be handled by one ServiceProvider and part of it by another. The 
statelessness of the Service is what makes this desirable property easily achieved. 
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3 Document Flow Model 

We have established the need to design asynchronous interactions among Web 
Services. In this section we introduce a design notation for reasoning about such 
interactions. We use this to show some of the consequences for reusability of 
components, when we are explicit about the behaviour in which they engage. In 
particular we address an important issue concerning the parallel delegation of 
work. 

The design notation we are going to introduce is based on our experience 
with XML and its associated technologies. Because we concentrate on the se- 
quence in which messages are sent and the consequences for a component of 
receiving a message, and since for us messages are documents, we call the design 
notation Document Flow Model (DFM). We use DFM to design systems which 
are eventually realised using XML encoded documents. 

Here is a document. 

[to:sl, from:u, query:q] 

It could be the message sent from the ServiceUser to the ServiceProvider in 
Figure 1. A document is an object (or a record) with named attributes, each of 
whose values is either an atom or a document. 

The only other part of DFM is that we show the action performed by a 
service provider on receipt of a message. In an asynchronous world this usually 
comprises querying and updating local state (if any) and sending replies or, more 
generally, further messages. So, for example we can specify the behaviour of the 
ServiceProvider in Figure 1 by 

onMessage [to:sl, from:u, query:q] 

send [to:u, from:sl, reply : [query : q, result:r]] 

Here we have used the incoming message as a pattern, where the values of the 
attributes are taken to be names for the relevant parts of the message. The re- 
sponse of the ServiceProvider is to construct a reply that is self-identifying by 
virtue of the fact that it contains sufficient details of the original query that 
the receiver will be able to re-establish its context. We assume the Service- 
Provider can compute the reply detail (r) from the query detail (q) . By returning 
[query : q, result : r] the ServiceUser doesn’t have to remember which query was 
sent where and in what order they were sent. The replies can return in any order 
and still be processed. Putting the query in the reply is the simplest example of 
adding context (state) to an interaction. 

Now let us look at the interactions that may take place in the system shown 
in Figure 2. Assume the ServiceUser sends the following document 

[to: a, from:u, query :q, context : c] 

It’s almost the same document as before, but this time sent to an Agent. This 
time the document carries an extra element, the context, whose purpose will 
soon become clear. The ServiceUser doesn’t care about whether they are talking 
to a ServiceProvider or an Agent. 

The Agent’s response is 
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onMessage [to: a, from:u, query :q, context : c] 
send [to:sl, from:a, query:ql, 

context : [from:u, query:q, context:c]] 

where we have assumed ql is a part of the query q, specifically to be addressed 
to si. This is a request to a ServiceProvider in the format expected by the Ser- 
viceProvider, in the sense that it has all the attributes that the ServiceProvider 
expects (i.e. to, from and query). We are using the convention of XML that extra 
elements in a message are acceptable. In this case, we are going to use them. 
But sometimes we will just ignore them. 

The response from the ServiceProvider is 

onMessage [to:sl, from:a, query:ql, context:c] 

send [to:a, from:sl, reply : [query :ql , result:rl], context:c] 

Again, we see that the reply is in the form that a ServiceProvider normally sends 
to a ServiceUser, but carrying extra state in the context component. The design 
of these documents is entirely at the discretion of the designer of the interaction. 
There is nothing special about any of the attributes (except that we assume the 
messaging system makes use of the to field) 

Next, the Agent sends the query on to the second ServiceProvider 

onMessage [to:a, from:sl, reply : [query : ql , result:rl], 

context : [from:u, query:q, context:c]] 
send [to:s2, from:a, query:q2, 

context : [from :u, query:q, context:c, 

reply : [query : ql , result:rl]]] 

again, assuming q2 is a subquery of q. And so it goes on. The second Service- 
Provider replies 

onMessage [to:s2, from: a, query :q2, context : c] 

send [to:a, from:s2, reply : [query :q2, result:r2], context:c] 

and so eventually, the Agent can complete the interaction 

onMessage [to:a, from:s2, reply : [query : q2 , result:r2], 
context : [from:u, query:q, context:c, 

reply : [query : ql , result :rl] ]] ] 
send [to:u, from:a, reply : [query : q, result : [rl ,r2] ] , 

context : [from:u, query:q, context:c]] 

This sequence of document exchanges, where we have assumed the Agent’s task 
is to consult both ServiceProviders, is a sequential solution. The two Service- 
Providers are consulted in sequence. The ServiceProviders actually respond to 
the queries they receive in exactly the same way, notwithstanding the different 
details we have given above. They compute their reply to the query (using local 
state, if necessary) and return their reply along with the context they received, 
no matter how complex that context is. 

It is here that the reuse guideline has come in to play. In Figure 3 we showed 
a deployment of the Agent that allows for a significant expansion of the plug- 
and-play requirement upon these components. In particular it allowed us to 
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build networks of Agents that work together to process complex queries. This 
requirement manifests itself in the way that the Agent constructs its reply to q2. 
It returns the reply along with the context of that reply. The context is needed 
in the case that the reply is being returned to another Agent. 

The Agent responds to the documents it receives in a stateless way. It uses 
the information in the document to demultiplex the messages it receives. So it 
can tell from the second reply that the entire query has been resolved and that 
it must reply to the user. 

But if it had sent the two queries off to the two ServiceProviders concurrently, 
putting the context in the message wouldn’t suffice. The Agent would need to 
remember the first reply until it got the second. The Agent would need to be 
stateful. This is the problem we will solve in the next section. 



4 Contexts and Coordination 



We introduce the notion of a Context Store in which we will store contexts (see 
Figure 7). Each context will be stored under a unique identifier (uid) and this 
identifier passed between services as a means of coordination. A Web Service 
can have its own unique Context Store, or it can share a Context Store with 
someone else. In general, shared Context Stores will be used to maintain the 
statelessness of the Web Service. Rather than reprogram the sequential solution 
to make use of this concept, we shall program a parallel solution. 



onMessage [to: a, from:u, query :q, 
generate new uid 
store uid -> [from:u, query :q, 
send [to:sl, from:a, query:ql, 
send [to:s2, from:a, query:q2, 



context : c] 

context : c] in CS 
context mid] 
context mid] 



onMessage [to: si, from: a, query :qi, context mid] 

send [tom, from:si, reply : [query :qi, result:ri], context mid] 



onMessage [tom, from:si, reply : [query :qi, result:ri], contextmid] 
store uid -> [query :qi, result :ri] in CS 
if CS [uid] contains [fromm, query:q, context : c] , 

[query :ql, result :rl] and [query :q2, result :r2] 
then send [tom, fromm, 

reply: [query :q, result : [rl ,r2] ] , context : c] 

First, the ServiceUser makes its usual request. The Agent’s response is to store 
the query in the Context Store (CS) under the name uid, which is a completely 
fresh unique identifier. The Agent then sends queries (ql and q2) to the two Ser- 
viceProviders simultaneously (well, asynchronously anyway) and gets on with its 
business of servicing other interleaved queries and replies. The ServiceProviders 
act as they always have, computing their reply in terms of their local state and 
the query and returning that to the Agent from which it came. The Agent re- 
ceives these replies in an undetermined order. Our solution is to first store the 
replies in the Context Store. We can do this because the ServiceProvider was 
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ServiceProvider 



Fig. 7. Making use of a Context Store to give an Agent some state 



cooperative enough to return the uid with the reply. Each time we get a reply, we 
look to see if we now have enough information to complete the Agent’s task. This 
will happen of course when the second reply arrives. But this structure obviously 
generalises to more than two Ser viceProviders. When enough information has 
been gathered, the Agent replies to the user, as before. 

Thus we see that the combination of a Context Store and stateful interactions, 
where the state in the message is simply a uid, is sufficient to solve the problem 
of parallelising the interaction. This is a conventional solution to the problem 
that we have captured succinctly in the DFM notation. 

But there is another valuable consequence of this design, important for dy- 
namic deployment and for component reuse. This is that the multiple-instances 
of a Web Service around a shared resource that we showed in Figure 6 actually 
generalises to multiple instances of an Agent. For the case of a sequential Agent, 
this is shown in Figure 8. 




Fig. 8. Document Flow to dif- 
ferent instances of a sequential 
Agent 



Fig. 9. Document Flow to sep- 
arate instances of a parallel 
Agent 



Figure 8 shows a special case of where the request flows through three sep- 
arate instances of the Agent. Since they share the same Context Store, that 
works, although it would of course have worked if the whole context had been 
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in the message. The dotted lines in this diagram show messages. One solid line 
is replaced by two dotted lines when the reply comes from or goes to a different 
instance than the request. 

Figure 9 shows the document flow in the case of a parallel Agent. This time, 
either one of the Agents that receives a reply from the ServiceProvider could be 
the one that realises the query is complete and hence replies to the user. 

5 Discussion 

Web Services are intended to be reusable. That is the whole idea. A Web Service 
is a bit of business logic published on the web for anyone (or any authorised one) 
to use. Its interface is published as syntax and its behaviour is described as a 
document-transformer. By providing the means for dynamic binding of Web 
Services to application code, this technology goes a long way towards realising 
the dreams of reusable components that we have had for a long time in software 
engineering. 

But it is not without its own problems. We want the systems built from 
Web Services to remain loosely-coupled, so that unplugging a component and 
plugging in another is not disruptive of a long running interaction. It should 
be the case, in a system of reusable components that, taking a component out 
degrades rather than damages or stops the system [6]. It should certainly be 
the case that putting the same component back into the same slot means that 
everything continues as normal, the only effect having been the inevitable delay. 
So here we have the potential for extreme reuse: box of components all of which 
can be plugged into that vacant slot, either providing different functionality, or 
the same functionality in different ways. 

We have shown that a simple service can be replaced by a more complex 
one (Figure 3). We have shown that many instances of a Web Service can be 
substituted for a single instance (Figure 6). We have shown that a sequential 
service can be replaced by a parallel one (Figure 9). And we have, by these 
constructions, shown that a mixture of multiple instances, sequential and parallel 
services can be mixed and matched in a straightforward plug-and-play way. 

The DFM notation has enabled us to make these claims explicit. The models 
shown here have been validated by building actual implementations using SOAP 
messaging and by testing these implementations extensively. We have shown that 
the components designed here are indeed reusable in all the contexts in which 
we have shown them. We are in the process now of extending this validation to 
include complete (finite) state space search, a process enabled by the formality 
of the DFM notation. 

6 Conclusions 

Designing systems of asynchronous web services is challenging. Addressing the 
design in terms of component reuse has forced us to address important questions. 
These are the same questions that need to be answered if dynamic configuration 
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of business solutions from web services is to be achieved. We have defined an 
architecture for dealing with reuse and dynamic reconfiguration, based on state- 
less services and stateful messages. We have defined a notation for describing 
the flow of documents in such a system. We have shown that this notation is 
effective at describing the behaviour of components, a necessary part of making 
components that are reusable by others. The fact that our components are web 
services didn’t mean that all the problems of reuse had been solved. We exposed 
some of them using a formal document flow notation and showed that some 
conventional solutions to these ploblems, specifically a Context and a Context 
Store, are indeed effective. 
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Abstract. For successful COTS component selection and integration, 
composers increasingly look at software measurement techniques. 
However, determining the complexity of a component’s adapter is still 
an ongoing concern. Here, a suite of measures is presented to address 
this problem within a COTS-based software measurement activity. Our 
measures are based on a formally defined component-based model, aim- 
ing at expressing and measuring some aspects of component adaptations. 

Keywords: Component-based system assessment. COTS components. 
Software Quality. Metrics. 



1 Introduction 

The rigorous measurement of reuse will help developers determine current levels 
of reuse and help provide insight into the problem of assessing software that is 
easily reused. Some reuse measures are based on comparisons between length or 
size of reused code and the size of newly written code of software components [12]. 
Particularly, measurement programs can facilitate incorporating an engineering 
approach to component-based software development (CBSD), and specifically 
to COTS component selection and integration, giving composers a competitive 
advantage over those who use more traditional approaches. 

Measurements let developers identify and quantify quality attributes in 
such a way that risks encountered during COTS selection are reduced. Then, 
measurement information might be structured as the proposal in [11], in which 
a methodology facilitates the evaluation and improvement of reuse and expe- 
rience repository systems by iteratively conducting goal-oriented measurement 
programs. However, most cost estimates for CBS developments are based on 
rules of thumb involving some size measure, like adapted lines of code, number 
of function points added/updated, or more recently, functional density [1,9]. 



J. Bosch and C. Krueger (Eds.): ICSR 2004, LNCS 3107, pp. 195—204, 2004. 
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To address this problem, in a previous work [4] we have adapted the 
model introduced in [2], which explores the evaluation of components using a 
specification-based testing strategy, and proposes a semantics distance measure 
that might be used as the basis for selecting a component from a set of candi- 
dates. 

By adapting this model, we have set a preliminary suite of measures for 
determining the functional suitability of a component-based solution. However, 
our measures are based on functional direct connections, i.e. there is no semantic 
adaptation between the outputs provided by a concrete component and its 
required functionality. The importance of defining functional adaptability 
measures comes from the importance of calculating the tailoring effort during 
COTS component integration. When analysing components, it is usually the 
case that the functionality required by the system does not semantically 
match with the functionality provided by the candidate components. Detecting 
additional or missed functionality is a more common case instead. 

In this paper, we are extending our previous suite of measures to quantify 
the components’ functional adaptability, in such a way that our measures may 
be combined to or used by some other approaches. 

In section 2 of the paper, we introduce the component-based model for mea- 
surement (from [2]) along with a motivating example. Then, section 3 presents 
a suite of measures for determining the degree in which a component solution 
needs adaptation. Section 4 presents two possible applications of our proposal 
by combining some related works. Finally, section 5 addresses conclusions and 
topics for further research. 



2 An Adaptation Model for Measurement 

At the core of all definitions of software architecture is the notion that the 
architecture of a system describes its gross structure, including things such as 
how the system is composed of interacting parts, where are the main pathways 
of interaction, and what are the key properties of the parts [13]. 

Components are plugged into a software architecture that connects partici- 
pating components and enforces interaction rules. For instance, the model in [2] 
supposes that there is an architectural definition of a system, whose behaviour 
has been depicted by scenarios or using an architecture description language 
(ADL), which usually provides both a conceptual framework and a concrete 
syntax for characterising software architectures. 

The system can be extended or instantiated through the use of some 
component type. Due several instantiations might occur, an assumption is 
made about what characteristics the actual components must possess from the 
architecture’s perspective. Thus, the specification of the architecture A ( Sa ) 
defines a specification Sc for the abstract component type C (i.e. Sa => S c ). 
Any component Ki , that is a concrete instance of C, must conform to the 
interface and behaviour specified by Sc- The process of composing a component 



Quantifying COTS Component Functional Adaptation 197 



K with A is an act of interface and semantic mapping. In this work, only the 
latter will be addressed. 

During the mapping, it can be the case that the semantics of K are not 
sufficient for Sc (he., -<{Sk => Sc)- In this situation, K is somehow lacking 
with respect to the behavioural semantics of C. The possibility is that K has 
partial behavioural compatibility with C. In this case, K either has incompatible 
or missing behaviour with respect to some portion of Sc- To overcome this, a 
semantic adapter must be specified (and built) such that, when composed 
with Sk, the adapter yields a component that is compatible with C [2] . 

The composition of this specification, _A|£, and Sk must satisfy the (Aj* o 
Sk) => Sc , as shown in Figure 1 (from [2]). The dashed line indicates that the 
adapter may provide some of the behavioural semantics if the component K is 
somehow deficient. 




Fig. 1 . Component K adapted for use in architecture A by adapter Ag * (from [2]) 



According to the work in [2], a number of issues arise when considering 
what behaviour Ag* must have. Firstly, all inputs in the domain of Sc that are 
not included in the domain of Sk must be accounted for by A s s “ and likewise 
for the outputs in the range of Sc, he. the domain and range of the aggregate 
must at least include that of Sc- Given that the domain and range of A.|* o Sk 
is consistent with Sc, the adapter A s s * must include those mappings from Sc 
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that are not supported by Sk- Essentially, must provide those mappings 
whose domain is in Sc but not Sk, and those mappings whose domain in both 
Sc and Sk but where the element mapped to in the range of Sc is not the same 
as the element mapped to in the range of Sk- These mappings are described 
formally in [2] as 1 : 

mapping{A s s *) = 

{( i,j ) I (* € (dom(Sc) \ Dg“ ) A S c (i) = j) V (i £ D s s * A S K {i) + Sc{i ) A Sjc(i) = j)} 

Secondly, all mappings not included in Sc and additionally provided by Sk 
should be hidden by A s s * to simplify the integration, i.e. A s s * must hide those 
mappings whose domain is in Sk and not in Sc and the element mapped to in 
the range of Sk is not in the range of Sc ■ These mappings can be described 
formally as: 

added(A s s “) = {( i,j ) | i £ (dom(S K )\Dg^) A S K (i) =j A j (£ rng(S c )} 

Finally, all mappings included in Sc and provided by Sk constitute the 
functionality provided by the component 1C. These mappings can be described 
formally as: 

funct(FgK) = {( i,j ) | i £ D s s * A S K {i) = j A S K (i) = S c (i)} 

2.1 A Motivating Example: Credit Card E-payment 

Authorisation and Capture are the two main stages in the processing of a card 
payment over the Internet. Authorisation is the process of checking the cus- 
tomer’s credit card. Capture is when the card is actually debited. 

We suppose the existence of some scenarios describing the two main stages, 
which represent here a credit card (CCard) payment system. The scenarios will 
provide an abstract specification of the input and output domains of Sc that 
might be composed of: 

— Input domain: (AID) Auth_IData{#Card, Cardholder_Name, Exp_Date, 
Bank_Acc, Amount}; (CH) Cardholder _ID; 

(CID) CaptureJData (BankA.cc, Amount}. 

— Output domain: (AOD) Auth_OData{okAuth}; (CHC) Cardholder_Credit; 
(COD) Capture_OData{ok_capture, DB_update}. 

— Getting Authorization: {AID n> AOD}. 

— Calculating Credit: {CH ^ CHC}. 

— Capture: {CID H- COD}. 

Suppose we pre-select two components to be evaluated, namely K\ and I \ 2 
respectively. However, the specification mapping, shown in Figure 2, reveals some 
inconsistencies that should be analysed. 

1 Comparison between ranges has been simplified by considering equality. A more 
complex treatment of ranges might be similarly specified, for example, by defining 
a set of data flows related by set inclusion. 
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ScW S Ki (i) 

Sia(i) 




Fig. 2. Functional mappings of Sc and Ski/Sk 2 (from [2]) 



3 A Measurement Suite for Functional Adaptability 

For the measure definitions, we assume a conceptual model with universe of 
scenarios S ; an abstract specification of a component C; a set of components K. 
relevant to C, and a mapping component diagram. In the following definitions, 
we use the symbol # for the cardinality of a set. To simplify the analysis, we 
also assume input /output data as data flows, i.e. data that may aggregate some 
elemental data. For the credit card example, input/output data are represented 
by {AID, CH, CID}, {AOD, CHC, COD} respectively. 

3.1 Implemented Functional Adaptability Measures 

Table 1 lists the proposed measures for measuring functional adaptability cases. 
The measures have been grouped into two main groups: component-level mea- 
sures and solution-level measures. The first group of metrics aims at detecting 
incompatibilities on a particular component /C, which is a candidate to be anal- 
ysed. For example, EF A s K aims at measuring the number of functions that are 
s c 

added when implementing the adapter of the component /C (extended function- 
ality). 

However, we could need to incorporate more than one component to satisfy 
the functionality required by the abstract specification C. In this case, the second 
group of metrics evaluates the functional adaptability of all components that 

constitute the candidate solution SAf. For example, HF ,s SN aims at measuring 

s c 

the number of functions hidden when implementing the adapter/s of the solution 
SAf. 

It is important to note that the amount of functionality implemented by the 
adapter depends on a design decision, that is, EF s K does not represent the 

S C 
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Table 1 . Description of the Functional Adaptability measures 
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number of tuples of the mapping Ag*, but the number of tuples (functions) 

actually added by the adapter. Similarly, HF.s SN represents the number of 

s c 

tuples (functions) actually hidden by the adapter. Therefore, we define the 
notions of Added Mapping and Hidden Mapping as follows: 

AM (Ag* ) = {(i,j) | (i,j) € mapping (A A (Ag*)(i) = j A S c {i ) = j} 

HM(A S S « ) = {(*, j) | (i,j) € added{A s s «) A (A^)(i) = {}} 

Then, contribution measures are defined to measure the completeness of the 

adapters’ functionality. For example, AC ,s K aims at measuring the percentage 

s ° 

in which a component adapter contributes to get the extended functionality 
required by Sc in the scenario S. A more formal definition of the measures is 
shown in Table l. 2 

Let’s compute the measures for our credit card example, where: 

mapping) A J 1 ) = {CH H- CHC ; CID COD}, 

added) Ag* 1 ) = {}, 

mapping) ) = {CH i-» CHC; AID AOD}, and 
added(A)^ 2 ) = {Taxes H > Statistics}. 

In a first scenario, suppose that all functionality is implemented by the 
adapters; i.e. 

AM {Ag* ) = mapping). AM (A|*)) and HM(A S S “) = added{AM{A s s «)). 
Hence, component-level measures for /Cl and K 2 are as follows: 



4 Kl 

s c 



S C 



The values of the measures show that selecting the component /Cl implies 
developing adaptation for adding two functions and no adaptation for hiding 
side functionality. On the other hand, selecting the component /C 2 might lead in 
implementing the same number of functions (not necessarily the same amount 
of functionality), but it also implies hiding one function (the one represented by 
the map ( Taxes , Statistics)). So, a balance is struck to decide on selecting a set 
of COTS candidates as a solution, selecting only one component, or developing 
the solution from scratch. 

Let’s suppose that we decide to select a set of COTS components suggesting 

the solution SJ\f as the set {/Cl,/C2}. Then, to calculate EF s SN the override 

s c 

operation, f AM s K . 1 < i < n, is explicitly expressed as the calculation of 

A S C ‘ 

t represents traditional map overlapping. Example: (3 true, 5 false} f (5 i — > 
true} = {3 true, 5 true}. 



2 
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{CH H> CHC; CID COD} f {CH ^ CHC; AID ^ AOD}. It results in 
{CH i — ^ CHC ; CID H > COD ; A/D i— >■ AOD}, which implicitly states a selection 
when the same functionality might be provided by more than one adapter in the 
solution SJ\f. 3 

Then, we continue computing the solution-level measures for our example 
as follows: 

{(}AM(A^ i ))vic i eS'iv\ (t/unct(Fj^ i ))vic i esw} = 

{CH h> CHC ; CID COD- AID ^ AOD} \ {CID ^ COD ; A/D h-> 

AOD} = {CD ^ 07/0} 

EF A S S N = 1) HFj^Sg JV = 1 

s c 

In this case, the adapter/s implement all the requirements, i.e. adding the 
missed mapping {CH H > CHC}, and hiding the mapping {Taxes H > Statistics}; 

hence AC ,s SN is equal to 1 (its highest possible value). But in a more complex 

s c 

case, we could have decided not to add some missed functionality. Therefore, in 

this case the value of AC ,s SN would be less than 1, indicating incompleteness 

s c 

for the adapter/s at the implemented- level. In this case, calculating SC ,s SN 

Sc 

would be also meaningful. 



4 Related Works: Possible Uses of the Measures 

Complexity of adaptability. Focusing on adapters, each extended function implies 
interaction with target components, which must be identified to determine all 
potential mismatches. For example, a component may try to access data that 
are considered private by the target component. This mismatch detection can be 
performed on every interface connection using the procedure defined by Gacek 
[8]. Once the connection type is known, such as call, spawn, or trigger, then all of 
the mismatches associated with that connection type are potential mismatches 
for the connection. Then, for each mismatch, potential resolution techniques 
might be considered from the proposal by Deline [7], where a weighting factor 
is assigned to each connection to describe the difficulty with which the solution 
can be implemented. 

Now, we may associate a resolution complexity factor to each extended func- 
tion of our model providing additional information, so that an appropriate choice 
can be made. For the E-payment example, there is one function added by the 
adapter at the solution-level: Calculating Credit. We suppose that there is a 
mismatch associated with this connection and the mismatch resolution tech- 
nique is wrapper for this case; hence the relative complexity (RelCplx) will be 6 
(from [7]). When more than one mismatch is associated to the same connection, 
or when there is more than one connection analysed for the same adapter, we 

3 Note that calculating f funct(Fg ^' ) also implies a selection when more than one 
component is able to provide the same required functionality. 
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suggest the Functional Adaptability Complexity (FAC) - related to the adapter 
- as the sum of all individual connection complexities on the table. 

We should note here that the PIC Software Productivity Consortium’s 
project [3], has recently determined an estimate for the effort required to in- 
tegrate each potential component into the existing system architecture. The 
estimate of integration complexity considers several factors and the mismatch 
resolution involves semantic as well as syntactic adaptation. Therefore, our work 
might be considered as a more specific proposal that could be used along with 
other current research efforts on measuring COTS component integration, such 
as the BASIS approach. 

Measuring architectural adaptability. Adaptability of an architecture can be 
traced back to the requirements of the software system for which the architecture 
was developed. The POMSAA (Process-Oriented Metrics for Software Architec- 
ture Adaptability) framework [6], achieves the need of tracing by adopting the 
NFR framework that is a process-oriented qualitative framework for representing 
and reasoning about NFRs (non-functional requirements) 4 . 

In the NFR Framework, the three tasks for adaptation become softgoals to 
be achieved by a design for the software system. An adaptable component of a 
software system should satisfice these softgoals for adaptation. Another point to 
be observed is that design softgoals are decomposed in a manner similar to the 
decomposition of the NFR softgoals. One of the softgoals to be decomposed is 
adaptability, which can be further described in terms of semantic adaptability, 
syntactic adaptability, contextual adaptability and quality adaptation. 

Our proposal suggests analysing each branch of the hierarchy of semantic 
adaptability of the NFR softgoal graph in terms of complexity and size, as 
we have previously defined. In this way, qualitative judgments on architectural 
adaptability would be based on more precise and objective values. After that, 
the architectural adaptability will characterise system’s stability at the higher 
level, conceptualised in terms of its functionalities for system’s users. 

5 Conclusions and Future Work 

We have presented a preliminary suite of measures for determining the functional 
adaptability of a component-based solution. The suite of measures might be 
integrated to other approaches - such as BASIS - and the final calculation might 
be applied to ponderate architectural decisions when calculating measures for 
architectural adaptability - such as the work in [6] suggests. 

Of course, there are some points for further research. On one hand, we should 
note that our measures are based on counting functional mappings, and their 
domains - specified by different levels of abstraction - could distort the final 
measure. Then, a more formal specification of input/output values as well as the 
relationships between them would reduce ambiguity when calculating the mea- 
sures. At this point some related works might be helpful, such as the proposal in 

4 For more detailed description of NFRs, we refer the reader to [5] 
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[10] , which measures a refinement distance by measuring both the requirements 
of SC that are left unfulfilled by /C as well as the functional features of /C that are 
irrelevant to SC. On the other hand, functional mappings come from scenarios 
that also need further discussion on their generation and documentation. 

Finally, our measures and the procedure need further validation. In order 
to demonstrate the applicability of our proposal, some empirical studies are 
currently carried out on the domain of E-payment systems. 
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Abstract. The standard approach to code reuse in object oriented languages is 
via inheritance. This is restrictive in a number of ways. For example, it leads to 
well known conflicts between subtyping and subclassing. Furthermore, where 
no type relationship exists, programmers must resort to inefficient techniques 
such as delegation to achieve code reuse. In the paper it is described how the 
language Timor decouples subtyping and code reuse and presents a new con- 
cept known as reuse variables, showing how these can be used to eliminate such 
restrictions in object oriented and component oriented contexts. 



1 Introduction 

One of the main advantages of inheritance in object oriented languages is often con- 
sidered to be code reuse, i.e. subclassing. Yet, as is well known, e.g. [3], inheritance is 
an overloaded concept, and subclassing often conflicts with other uses of inheritance, 
especially subtyping. In designing the programming language Timor the authors have 
adopted the view that subtyping is the more fundamental inheritance concept and that 
code reuse is better achieved by an entirely different mechanism, known as reuse 
variables. 

In section 2 we present a brief guided tour of those Timor concepts needed to un- 
derstand the rest of the paper. Section 3 then presents the basic idea of type reuse 
variables. Section 4 compares them with delegation. In section 5 it is shown that the 
traditional conflict between subtyping and subclassing can be trivially resolved with 
reuse variables. The concept of implementation reuse variables is introduced in sec- 
tion 6, which illustrates how conventional subclassing (including overriding) can be 
simulated and how different subtypes of a common supertype which have no subtyp- 
ing relationship to each other can reuse and modify code. Section 7 briefly hints at 
further applications of reuse variables. The paper concludes with a discussion of re- 
lated work in section 8 and some final remarks in section 9. 



2 A Guided Tour of Relevant Timor Concepts 

Timor has been designed primarily as a language for supporting the development of 
software components. Wherever feasible, Java and C++ have been used as its basic 
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models, but it is structurally a quite different language. One of its aims is to support a 
components industry which develops general purpose software for reuse in many 
different application systems. This aim has fundamentally influenced the main features 
of the language. In this section we briefly outline those features of the language neces- 
sary for understanding the rest of the paper. Other unusual features of Timor are de- 
scribed on the Timor website 1 and in papers referred to there. 

Timor has abandoned the traditional 00 class construct by decoupling interfaces 
and their implementations. Components designed at an abstract level can often have 
quite different implementations (e.g. a type Queue can be implemented as a linked list, 
as an array, etc.) and a component designer may well wish to produce and distribute 
several different implementations for the same type. Consequently there is a rigorous 
distinction between an interface and its potentially multiple implementations. Inter- 
faces and implementations are formulated according to the information hiding princi- 
ple [10]. An interface is either a type (which may be concrete or abstract) or a view. 

Concrete types are units from which multiple instances can be constructed. Con- 
structors, known in Timor as makers, have individual names and are listed in a section 
introduced by the keyword maker. A maker returns an instance of its own type. If a 
maker is not explicitly defined in a concrete type the compiler automatically adds a 
parameterless maker with the name init. 

The instances of a type are manipulated by methods defined in an instance sec- 
tion. Instance methods must be designated by the keyword op (i.e. operations which 
can change state variables of the instance) or the keyword enq (enquiries which can 
read but not change state variables). 

It is well known that binary methods are problematic [1]. Timor supports binary 
type methods, i.e. methods that access multiple instances of the type (e.g. to compare 
them). A compile time error occurs if a programmer attempts to define an instance 
method which has a parameter (value or reference 2 ) or local variable of its own type or 
a supertype thereof. 

Abstract types are intended to be abstractions of "complete" types (e.g. a collection 
as an abstraction for a set, a bag, a list, etc.) and although they do not have real mak- 
ers, they can predefine makers for concrete types derived from them [6]. They have 
instance methods which are inherited and binary methods which can (but need not) 
predefine methods for derived types. 

A view is an interface which defines a related set of instance methods that can use- 
fully be incorporated into different types. It is intended to encourage the idea of "pro- 
gramming to interfaces". Views can have instance methods, but not makers or binary 
methods. 

In Timor a type can be derived from other types by extension and/or inclusion. The 
resulting units are known as derived types and those from which they are derived are 
their base t\>pes. The keyword extends defines a subtyping relationship and is in- 
tended to be used to signal behavioural subtyping (in the sense of Liskov and Wing 
[9]), though this cannot be checked by a compiler. Instances of a type derived by ex- 



1 see www.timor-programming.org 

2 References are not directly related to physical addresses. They are logical references to ob- 
jects. They cannot be directly manipulated as if they were addresses. 
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tension can be assigned to variables of the supertype(s). The keyword includes al- 
lows an interface to be inherited without implying a subtyping relationship. In this 
case the derived unit cannot be used polymorphically as if it were an instance of the 
base unit [5], Timor also supports multiple type inheritance [6, 7], 

An interface (type or view) can have multiple implementations (introduced by the 
keyword impi), which should be behaviourally equivalent, in the sense that each ful- 
fils the interface's specification. Like behavioural conformity, behavioural equivalence 
cannot be checked by a compiler. 

An implementation includes a state section, in which the instance variables 
needed to represent the state of individual objects of the type are declared. These can 
be value variables (including the reuse variables discussed in this paper) and refer- 
ences to other objects. The code of instance methods (both interface and internal) 
appears in an instance section. Different implementations of the same type are not 
directly related to each other (although one implementation of a type can contain reuse 
variables giving access to other implementations of the same type). 

Types can be instantiated either as separate objects or as values which can be re- 
garded as components of an object. Objects are always accessed via references. An 
object can contain references to other objects of the same or different types. 



3 Type Reuse Variables 

An implementation of a type in Timor is always a "complete" implementation in the 
sense that its state section contains all the variables necessary to represent the state of 
an instance of the type and there is an implementation of all the methods. Different 
implementations of the same type (e.g. List) each have separate state sections and 
these can contain entirely unrelated variables (e.g. one an array, another a linked list, 
etc). Similarly, implementations of a subtype are "complete" implementations of the 
subtype and are in principle independent of implementations of the supertype(s). Thus 
the implementations of a type and of its subtypes each have separate state sections 
which can contain entirely unrelated variables, and state variables are not inherited. In 
this sense subclassing is not supported and code is never automatically inherited. 

Reuse variables are provided as a flexible means of supporting the reuse of code 
and state. A reuse variable is a variable whose type name is prefixed by a "hat" sym- 
bol ( A ). It can appear in a state section, but not as a local variable declared in a 
method. What makes a reuse variable special is that the compiler can treat its public 
instance methods (but not its makers or binary methods) as implementations of the 
public methods of the type being implemented. In every other respect a reuse variable 
is just an ordinary state variable declared by value. 

More than one reuse variable can appear in an implementation. The compiler uses 
the following matching procedure to determine how the public instance methods of a 
type (which are defined in the type definition) are implemented. 

First it searches the instance section of the implementation, matching those public 
instance methods of the type being implemented which are explicitly coded in the 
implementation. If at this stage any of the public instance methods of the type remain 
unmatched, the compiler then selects the first reuse variable in the implementation and 
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checks whether any of its public instance methods match any of the remaining public 
instance methods of the type. Those which match (if any) are selected as the imple- 
mentations of the corresponding public instance methods. If any unmatched methods 
remain, the next reuse variable is examined in the same way. This process continues 
until all the public instance methods of the type being implemented have been 
matched or until no further reuse variables are found. If methods remain unmatched, 
the compiler reports the error that the implementation is incomplete. 

A successful match in this context means that the signatures and return types of the 
methods are identical. (The types and order of parameters are considered to be part of 
the signature, but not their identifiers.) A match can occur if a public method of a 
reuse variable is defined to throw the same or a subset of the exceptions thrown by the 
method for which an implementation is being sought, as defined in the type definition. 



4 Reuse Variables as an Alternative to Delegation 

Delegation is a technique which can be used in object oriented programs when the 
code of one class can usefully be reused in the implementation of another class. To 
illustrate this, consider a type List (which can contain duplicate elements and allows 
clients to insert elements at specified positions) defined along the following lines: 

type ListC ELEMENT :> { 
instance : 

op void insert (ELEMENT e) throws OverflowEx; 

// inserts e at end of list 
op void insertAtPos (ELEMENT e; int pos) 

throws OutOfBoundsEx, OverflowEx; 

// inserts e at position pos in list 
op void remove ( ELEMENT e) throws NotFoundEx; 

// removes e from list 

op void removeAtPos ( int pos) throws OutOfBoundsEx; 

// removes the element at position pos from the list 
enq ELEMENT getAtPos ( int pos) throws OutOf BoundEx; 

// returns the element at position pos 
enq int position (ELEMENT e) throws NotFoundEx; 

// returns the (first) position at which e occurs 
enq boolean contains (ELEMENT e) ; 

// returns true if e in list 

enq int size(); // returns the number of elements in the list 
maker : 

init(int maxSize) ; 

} 

This type might have several implementations (e.g. as an array, a linked list, etc.). 

Suppose that the programmer now wishes to define and implement a type Bag. This 
could be defined in terms of that subset of the methods used to define List which are 
not necessarily concerned with a position, e.g. 

type Bag< : ELEMENT : > { 
instance : 

op void insert (ELEMENT e) throws OverflowEx; // inserts e into bag 
op void remove ( ELEMENT e) throws NotFoundEx; // removes e from bag 
enq boolean contains (ELEMENT e) ; 
enq int size(); 
maker : 
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init(int maxSize); 

} 

A simple way to implement the type Bag in the standard object oriented paradigm 
would be to delegate the invocations of the Bag instance methods to an internal List 
variable: 

impl Bag Impl< : ELEMENT : > of Bag< : ELEMENT : > { 
state : 

Lis t< : ELEMENT : > aList; 
instance : 

op void insert (ELEMENT e) throws OverflowEx {aList . insert (e) ; } 
op void remove ( ELEMENT e) throws NotFoundEx {aList . remove (e) ; } 
eng boolean contains (ELEMENT e) {return aList . contains ( e) ; } 
enq int size () {return aList . size (); } 
maker : 

init (maxSize) {aList . init (maxSize) ;} 

} 

The same effect can be achieved more easily and more efficiently by means of Timor's 
reuse variables: 

impl Bag Impl2< : ELEMENT : > of Bag< : ELEMENT : > { 
state : 

A List<: ELEMENT :> aList; 
maker : 

init (int maxSize) {aList . init (maxSize) ; } 

} 

Here all the public instance methods of the type Bag are matched from those of A List 
aList, and are used as the public methods of Bag for this implementation. 

The Timor approach is easier than normal delegation because it saves the pro- 
grammer the effort of writing unnecessary code, and it is more efficient because at 
run-time the appropriate instance method is invoked directly. 



5 Subtyping and Subclassing 

A List can in fact be regarded as a subtype of a Bag, e.g. 

type ListC ELEMENT :> { 
extends : Bag< : ELEMENT : > ; 
instance : 

op void insertAtPos (ELEMENT e; int pos) 

throws OutOfBoundsEx, OverflowEx; 
op void removeAtPos ( int pos) throws OutOfBoundsEx; 
enq ELEMENT getAtPos(int pos) throws OutOfBoundEx; 
enq int position (ELEMENT e) throws NotFoundEx; 
maker : 

init(int maxSize); 

} 

This illustrates an interesting point, viz. that reuse variables can be used to reverse the 
subtyping and subclassing relationship: List is a subtype of Bag, and an implementa- 
tion of the subtype is used to implement the supertype (see Bagimpi2). 

Notice that any implementation of the subtype can be used to implement the su- 
pertype in these examples. However, the reverse is not true: an arbitrary implementa- 
tion of Bag cannot be directly used (without overriding) to implement List. The rea- 
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son is that whereas it is irrelevant at the type level where insertions are placed in a 
Bag, an arbitrary implementation of the insert method of Bag does not guarantee that 
the new element will be inserted correctly into a position-sensitive List (here the final 
position). 

Two aspects of the design of Timor suggest that it is possible to generalise this re- 
lationship between subtyping and code reuse. First the separation of the two concepts 
makes it possible to reverse the subtyping/subclassing relationship. Second, the guar- 
antee that binary methods can only be defined as type methods (see section 2) ensures 
that well-known problems can be avoided with respect to instance methods [1]. Under 
these circumstances, and provided that a subtype is a genuine behavioural subtype in 
the sense defined in [9], and that it is mutable with respect to the key features of the 
type, it appears that it may always be possible to reuse any subtype implementation 
(unchanged) to implement the corresponding methods of a supertype 3 . With automatic 
subclassing (and the possibility of defining binary methods as instance methods) this 
potential advantage is lost in the standard 00 paradigm. 

In contrast, only when the behaviour in the supertype remains unchanged in a sub- 
type, is it possible to reuse any implementation of the supertype to implement a sub- 
type, in accordance with the conventional subclassing paradigm. This condition is 
likely to be met, for example, when a type Person has a subtype student. The latter 
could normally be implemented as follows: 

impl Studentlmpl of Student { 
state : 

"Person aPerson; 
instance : 

// the Student public methods added in the subtype 
maker : 

init() {aPerson. init (); } 

} 

However, when the subtype's behaviour is more constrained, the standard subclassing 
paradigm has to resort to overriding, and this can have the disadvantage that a detailed 
knowledge of a specific implementation of the supertype is assumed. In such cases this 
can be considered inferior to the Timor technique in which type reuse variables of a 
behaviourally conform subtype are used to implement a supertype. 

6 Implementation Reuse Variables 

In the previous examples the matching occurs without reference to a particular imple- 
mentation of the type of the reuse variable, and any implementation can actually be 
reused. (How actual implementations are chosen is determined by other mechanisms 
not relevant to this paper.) Hence we refer to such variables as type reuse variables. 

Timor supports the idea of overriding by allowing implementation reuse variables 
to be declared. In principle these are similar to type reuse variables, except that the 
"type" of the reuse variable is named as an implementation. This gives the program- 
mer access to internal state (and private methods) of this implementation. 



3 We intend to investigate the accuracy of this as yet unproven assertion in future work. 
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With this technique conventional subclassing (with overriding) can easily be simu- 
lated, as the following example illustrates. We begin with an (outline) implementation 
of Bag (here using an array as the basis for the implementation): 

impl Baglmpl3<: ELEMENT :> of Bag< : ELEMENT : > { 
state : 

ELEMENT [ ] theArray ; 
int elementCount = 0; 
int maxSize; 

// other state information 
instance : 

op void insert (ELEMENT e) throws OverflowEx 

{/* inserts e into theArray */} 
op void remove ( ELEMENT e) throws NotFoundEx 

{/* removes e from theArray */} 

eng boolean contains (ELEMENT e) {/* checks if e in theArray */} 
enq int size() {/* returns the number of elements in theArray *} 
maker : 

init (maxSize) { 
this. maxSize = maxSize; 
theArray = ELEMENT []. init (maxSize) ; 

} 

} 

Using conventional subclassing this could be extended in a subclass List. Since for a 
Bag there is no logical beginning or end position, it is unlikely that the insert method 
of Bag imp 13 would work correctly for List, which is defined to append new items at 
the logical (position-sensitive) end. Consequently (at least) the insert method would 
have to be overridden. This is how it might appear in Timor: 

impl Listlmpllc ELEMENT :> of Lis t< : ELEMENT : > { 
state : 

A Bag Impl3< : ELEMENT : > theBag; // an implementation reuse variable 
// additional state information, e.g. 

int firstElement =0; // the array index of the first element 
int firstFree =0; // the array index of the last element 
instance : 

op void insert (ELEMENT e) { 

// this overrides the insert method of theBag 
with (theBag) { 
if (elementCount < maxSize) { 
theArray [firstFree] = e; 
elementCount++ ; 
firstFree = ... 

} 

else throw new OverflowEx. init () ; 

} 

// additional instance methods, e.g. 
op void insertAtPos (ELEMENT e; int pos) 

throws OutOf BoundsEx, OverflowEx { 

} 

maker : 

init(int maxSize) {theBag. init (maxSize) ; } 

} 

The "overriding" code appears in the instance section of the new implementation, 
and this gives access not only to the public methods of implementation reuse variables 
but also to their internal data structures (e.g. theArray, elementCount) and where 
appropriate their internal methods. Access to these is via the dot notation (e.g. 
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theBag. theArray), but as the example illustrates, the code becomes shorter when the 
(Pascal-like) with statement is used. 

The use of implementation reuse variables is not confined to situations which 
simulate subclassing, but can for example be used where no direct super/subtype rela- 
tionship exists. The technique is used for example in the Timor Collection Library 
(TCL) to provide implementations of the type SortedList (where insertions are 
automatically ordered according to a user supplied ordering rule) as variants of the 
List implementations. In the TCL both List and SortedList are subtypes of the 
abstract type Collection, but neither is a subtype of the other [6]. An implementation 
of SortedList has a pattern along the following lines: 

impl SortedListImpll< : ELEMENT: > of SortedList< : ELEMENT : > { 
state : 

A ListImpll<: ELEMENT :> theList; 

// an implementation reuse variable 
instance : 

op void insert (ELEMENT e) { 

// this overrides the insert method of theList 
if (elementCount < maxSize) { 
sortlntoList (e) ; // this is an internal method which 
// sorts e into the list at the appropriate point 



} 

else throw new Overf lowEx . init ( ) ; 

} 

maker : 

init(int maxSize) {theList . init (maxSize) ; } 

} 



7 Further Applications of Reuse Variables 

Lack of space forbids us to illustrate the full power of reuse variables. Two particu- 
larly important advantages, which will be described in a future paper, allow Timor to 
support both multiple code reuse and repeated inheritance with code reuse in a 
straightforward manner. 

Also, Timor provides a mechanism for mapping the names of methods, with the ef- 
fect that the delegation technique described in section 4 can also easily be used in 
situations where in a conventional class based 00 language such as Java the pro- 
grammer might use the adapter pattern [4], but again with the efficiency advantage 
that at run-time the appropriate instance methods are invoked directly. 



8 Relation to Other Work 

As was indicated in section 4, reuse variables can be seen as an efficient mechanism 
for some cases of delegation, and they can be used either to imitate subclassing or to 
reverse subclassing. Timor's separation of types and implementations allows these 
aspects of inheritance to be handled orthogonally. Although this separation has previ- 
ously appeared in other languages, notably Theta [8] and Tau [11], they do not sup- 
port a concept of reuse variables 
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However, the idea of reuse variables can be viewed as a development of the code 
reuse technique proposed by Schmolitzky in [11]. It differs from his proposal by add- 
ing the idea of state to the reuse technique, thus opening the way for a straightforward 
implementation of repeated inheritance, and simplifying the constructs needed, e.g. by 
making special support for "super" superfluous (which is especially helpful in the 
context of multiple and repeated implementation inheritance). 

Reuse variables superficially resemble the delegation technique used in prototype 
based languages [2, 12]. In each case "missing" interface methods are supplied from 
variables declared within the implementation. However, the differences are over- 
whelming. In Timor this is merely an implementation technique, not affecting type 
relationships; reuse variables are implemented by value, not by pointer; the search for 
methods occurs at compile-time, not at run-time; the search is not recursive, etc. 

9 Final Remarks 

The paper has presented an alternative to subclassing which simplifies the reuse of 
code in object oriented systems, especially in cases where subclassing and subtyping 
conflict and where code can reasonably be used although an appropriate subtyping 
relationship may not be present. At the same time it provides an efficient alternative to 
some forms of delegation and it easily handles multiple and repeated implementation 
inheritance. 
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Abstract. Reuse between software systems is often not optimal. An important 
reason is that while at the functional level well-known modularization principles 
are applied for structuring functionality in modules, this is not the case at the build 
level for structuring files in directories. This leads to a situation where files are 
entangled in directory hierarchies and build processes, making it hard to extract 
functionality and to make functionality suitable for reuse. Consequently, software 
may not come available for reuse at all, or only in rather large chunks of function- 
ality, which may lead to extra software dependencies. 

In this paper we propose to improve this situation by applying component-based 
software engineering (CBSE) principles to the build level. We discuss how existing 
software systems break CBSE principles, we introduce the notion of build-level 
components, and we define rules for developing such components. To make our 
techniques feasible, we define a reengineering process for semi-automatically 
transforming existing software systems into build-level components. Our tech- 
niques are demonstrated in a case study where we decouple the source tree of 
Graphviz into 47 build-level components. 



1 Introduction 

Modularity is a prerequisite for component technology [17]. Already in 1972, Parnas 
introduced the modularization principles of minimizing coupling between modules and 
maximizing cohesion within modules [16]. The former principle states that dependencies 
between modules should be minimized, the latter principle states that strongly related 
things belong to the same module. These principles are well understood at the functional 
level for structuring functionality in functions or methods and in modules or classes. 

Unfortunately, these principles are usually not applied at the build level for structuring 
modules and classes in directories. Often, bad programming practice like strong coupling 
and weak cohesion therefore move from the functional level to the build level. 

In practice, many software systems therefore consist of large collections of files that 
are structured rather ad-hoc into directory hierarchies. Between these directories a lot of 
references exist (= strong coupling) and directories often contain too many files (= weak 
cohesion). Build knowledge gets unnecessarily complicated due to improper structuring 
in monolithic configuration files and build scripts. 

* This research was sponsored by the Dutch National Research Organization (NWO), Jacquard 
project TraCE. 
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As a result, modules are entangled, the composition of directories is fixed, and build 
processes are fragile. This yields a situation where: i) potentially reusable code, contained 
in some of the entangled modules, cannot easily be made available for reuse; ii) the fixed 
nature of directory hierarchies makes it hard to add or to remove functionality; iii) the 
build system will easily break when the directory structure changes, or when files are 
removed or renamed. 

To improve this situation, we can learn from component-based software engineering 
(CBSE) principles. In CBSE, functionality is only accessed via well-defined interfaces, 
and one cannot depend on the internal structure of components. Unfortunately, CBSE 
principles are not yet applied at the build level. Reusability of components is therefore 
hampered, even when CBSE principles are applied at the functional level. 

For example, the ASF+SDF Meta-Environment [1] is a generic framework for lan- 
guage tool development. It contains generic components for parsing, pretty-printing, 
rewriting, debugging, and so on. Despite their generic nature and their component-based 
implementation, they were not reusable in other applications due to their build-level en- 
tangling in the ASF+SDF Meta-Environment. After applying the CBSE principles dis- 
cussed in this paper, they became distinct components, which are now reused in several 
different applications [10]. Graphviz [4] is another example. It is too large, yielding too 
many external dependencies. In this paper we demonstrate how its implementation can 
be restructured such that its individual parts can be separately reused. 

In this paper we discuss how to apply CBSE principles to the build level, such 
that access to files only occurs via interfaces, and dependencies on internal directory 
structures can be dropped. We also describe a composition technique for assembling 
software systems from build-level components. CBSE principles at the build level help 
to improve reuse practice because build-level components can be reused individually 
and be assembled into different software systems. To make our techniques feasible, we 
propose a semi-automatic technique for decoupling existing software systems in build- 
level components. We demonstrate our ideas by means of a case study, where Graphviz 
(300,000+ LOC) is migrated to 47 build-level components. 

The paper is structured as follows. In Sect. 2 we introduce the concept of build- 
level components. In Sect. 3 we discuss bad programming practice and we introduce 
development rules for build-level components with strong cohesion and weak coupling. 
In Sect. 4 we discuss automated composition of build-level components. In Sect. 5 
we present a semi-automatic process for decoupling software systems into build-level 
components. In Sect. 6 we demonstrate our ideas by means of a case study. In Sect. 7 
we summarize our results and discuss related work. 

2 Build-Level Components 

According to Szyperski [17], the characteristic properties of a component are that it: i) 
is a unit of independent deployment; ii) is a unit of third-party composition; iii) has no 
(externally) observable state. He gives the following definition of a component: 

“A software component is a unit of composition with contractually specified 
interfaces and explicit context dependencies only. A software component can 
be deployed independently and is subject to composition by third parties." 
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Component-based software engineering (CBSE) is mostly concerned with execution- 
level components (such as COM, CCM, or EJB components). We propose to apply CBSE 
principles also to the build level (i.e., to directory hierarchies containing ingredients of an 
application’s build process, such as source files, build and configuration files, libraries, 
and so on). Components are then formed by directories and serve as unit of composition. 

Access to build-level components occurs via build, configuration, and requires in- 
terfaces. Build interfaces serve to execute actions of a component’s build process (e.g., 
to build or install a component), configuration interfaces serve to control how a compo- 
nent should be build (i.e., to support build-time variability). Requires interfaces serve 
to bind dependencies on other components. Referencing other components no longer 
occurs via hard and fixed directory references, but only via the dependency parame- 
ters of a requires interface. Dependency parameters allow late binding by third-parties. 
Since all component access occurs via interfaces, build-level components can be inde- 
pendently deployed and their internal structure can safely be changed. Directories with 
these properties satisfy the component definition of [17] and can be used for build-level 
CBSE. 

This paper is concerned with developing build-level components and with extracting 
such components from existing applications. Our work is based on the GNU Autotools, 
which serve build- time configuration (Autoconf) and software building (Automake). 
Strictly spoken, the GNU Autotools are not essential, but they make life much easier. 

Autoconf [12] is a popular configuration script generator that produces a top-level 
configuration script for a software system. The script is used to instantiate Makefiles 
with a concrete configuration. The input to Autoconf is a configuration script in which, 
amongst others, configuration switches and checks can be defined. We use Autoconf 
because it provides a consistent way for build-time configuration (i.e., all software sys- 
tems driven by Autoconf can be configured similarly). This simplifies composition of 
build-level components (see Sect. 3). 

Automake [13] is a Makefile generator. Its input is a high-level build description from 
which standard Makefiles are generated conforming to the GNU Makefile Standards [3], 
The benefits of Automake are that it simplifies the development of build processes, and 
that it standardizes the process of software building. The latter is of great importance 
for CBSE and the main reason that we depend on Automake. Build processes generated 
by Automake always provide the same set of build actions. Automake thus generates 
standardized build interfaces. Having standardized build interfaces enables composition 
of build-level components (see Sect. 3). 



3 Build-Level Development Rules 



Many software systems break CBSE principles at the build level. This results in stronger 
coupling and weaker cohesion. In this section we analyze typical build-level practices 
that break these principles, and we provide component development rules that enable 
CBSE at the build level. 
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3.1 Component Granularity 

Pitfall: Components with Other than Directory Granularity. The granularity of a com- 
ponent is important for its usability [17,10]. If components are too large, then cohesion 
is weak. Consequently, by reusing them, too much functionality is obtained that is not 
needed at all. On the other hand, if components are too small then coupling will be strong 
and it may take too much effort to assemble a system from them. 

In practice, build-level component granularity is too large. There are many examples 
of software systems (e.g., Graphviz, Mozilla, and the Linux kernel [7]) where poten- 
tial reusable functionality is not structured in separately reusable directory hierarchies. 
Consequently, the complete directory hierarchy containing the implementation of a full 
software system has to be used if only a small portion of functionality is actually needed. 
Often this is not an option and reuse will not take place. 

Rule: Components with Directory Granularity. Build-level components should have 
directory granularity, cohesion in directories should be strong, and coupling between 
directories should be minimal. With strong cohesion the contents of a directory forms 
a unit and chances are low that there is a need to only reuse a subset of the directory. 
Minimal coupling between directories makes directories independently deployable; an 
important CBSE principle. If a particular directory is not intended for individual reuse, 
then it can be part of a larger directory structure. This slightly increases component 
granularity, but prevents the existence of many small-sized components that are not 
actually reused individually. 

3.2 Circular Dependencies 

Pitfall: Circular Dependencies. If two collections of files are separated in distinct di- 
rectories but reference each other, then they are strongly coupled. Although they form 
distinct components, they cannot be used independently because of their circular needs. 

Such a decomposition into distinct directories breaks the modularization principle 
of minimizing coupling. Basically, circular dependencies prove that cohesion between 
directories is strong and that they belong together (or, at least, that they should be de- 
composed in another way). Circular dependencies between directories therefore, almost 
always indicates that something is wrong with the structure of the implementation of a 
software system in files and directories. 

Rule: Circular Dependencies Should Be Prevented. Striving towards weak coupling 
forms an important motivation for minimizing circular dependencies. One solution is 
to simply merge circular dependent directories. If this significantly reduces cohesion 
then a third directory may be constructed capturing the strongly related subparts of both 
directories. Both directories then become dependent on the newly created one, but not 
the other way around. 

3.3 Build Interface 

Pitfall: Non- standardized Build Interfaces. There are many different build systems 
available, often providing incompatible build interfaces. For instance, software build- 
ing with Imake, Ant, or Automake requires execution of different sequences of build 
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actions. These different build interfaces hamper compositionality of build-level compo- 
nents, because build process definitions cannot be composed transparently. The reason 
is that internal knowledge of a build-level component is required in order to determine 
which actions constitute a component’s build process. This breaks the abstraction prin- 
ciple of CBSE which prescribes that component access only goes through well-defined 
interfaces. 

Rule: Software Building via Standardized Build Interface. In order to make build-level 
components compositional, build process definitions should all implement the same 
build interface. This way the steps involved in the build process become equal for each 
component. 

Implementation with Autotools. We standardize on the build interface offered by Au- 
tomake. This interface includes the build actions all for building, clean for removing 
generated files, install for installing files, test for running tests, dist for building 
distributions, and distcheck for building and validating distributions. 

3.4 Configuration Interface 

Pitfall: Non-standardized Configuration Interfaces. What holds for build processes also 
holds for configuration processes. If standardization is lacking and different configura- 
tion mechanisms are in play, then configuration is not transparent, hampering compo- 
sitionality. For configuration, knowledge of the component is then needed to determine 
the configuration mechanism used. Again, this breaks the abstraction principle of CBSE 
because access to the component outside its interfaces is inevitable. 

Rule: Compile-time Variability Binding via Standardized Configuration Interface. Stan- 
dardization of variability interfaces is needed to improve compositionality. Only then it 
becomes transparent how to bind compile-time variability of varying compositions of 
build-level components. 

Implementation with Autotools. In this paper we standardize on the configuration inter- 
face offered by Autoconf. This provides a standard way for binding configuration and 
dependency parameters. For example, to turn a feature f on and to bind a parameter 
p to some_value, an Autoconf configuration script can be executed as . /configure 
— enable-f — with-p=some_value. By using Autoconf a component can be config- 
ured individually, as well as in different compositions, and it is always clear how its 
variability parameters can be bound . 1 

3.5 Requires Interface 

Pitfall: Early Binding of Build-level Dependencies. A composition of directories is 
often specified in source modules or in build processes. For instance, consider the C 

1 Observe that, Autoconf is not strictly necessary because a similar configuration interface (with 
the same commands and syntax) can be obtained in other ways as well. 
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fragment #include " . . /bar /bar . h" from a hypothetical component f oo. This frag- 
ment clearly defines a composition and, consequently, increases coupling between f oo 
and another component bar. The component bar is a build-level dependency of foo. 
The composition expressed in the C fragment is therefore a form of early-binding. 
Early binding of dependencies increases coupling, prevents independent deployment, 
and third-party binding. 

Rule: Late Binding of Build-level Dependencies via Requires Interface. References 
to directories and files should be bound via dependency parameters of a component’s 
requires interface. This is a form of late binding that allows third parties to make a 
composition, and that caters for different directory layouts. 

Implementation with Autotools. With Automake and Autoconf this can be achieved by 
defining separate configuration switches for each required component. For instance, 
component foo can define a configuration switch for its dependency on bar as follows: 

AC_ARG_WITH ( — with-bar , [...], BAR=${withval» 

AC_SUBST(BAR) 

In the Makefile of foo the variable BAR is then used to reference the bar component. 
Dependency parameters are bound at configuration time. 



3.6 Build Process Definition 

Pitfall: Single Build Process Definition. If build knowledge of a composition is central- 
ized (e.g., in a top-level Makefile), then coupling between components is increased and 
the components cannot easily be deployed individually. This is because build knowl- 
edge for a specific component needs to be extracted, which is difficult and error prone. 
Unfortunately, single build process definitions are common practice. 

Rule: Build Process Definition per Component. Build-level components need individual 
build process definitions. This way a component can be built independently of other 
components and, consequently, be part of different compositions. There are many ways 
to define an individual build process. For instance, it can be defined as a batch file, 
a traditional Makefile, or as an Automake Makefile. In this paper we use Automake 
Makefiles. 



3.7 Configuration Process Definition 

Pitfall: Single Configuration Process Definition. It is common practice to centralize 
build-time configuration knowledge of software systems. Inside the files that capture this 
configuration knowledge, it is usually not clear which configuration parameters belong 
to which directory, and which parameters are shared. This form of coupling hampers 
reuse because components cannot be deployed individually without this knowledge and 
because extracting component-specific configuration knowledge is difficult. 
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Rule: Configuration Process Definition per Component. A software build process of- 
ten contains numerous build-time variation points. To allow independent deployment, 
the configuration process, in which such variation points are bound, needs to be inde- 
pendent of other build-level components (only when generated from individual ones, 
a single configuration process definition is acceptable). To that end, each build-level 
component should have an independent configuration process definition. In this paper 
we use Autoconf configuration scripts. 

3.8 Component Deployment 

Pitfall: Using a Configuration Management System for Component Deployment. Putting 
a software system under control of a Configuration Management (CM) system can 
increase coupling. The reason is that the directory structure (and thus the composition 
of directories) is stored in a CM system and that it is not prepared for individual use. It is 
therefore often not easy to obtain subparts from a CM system. Furthermore, it is not easy 
to make different compositions of directories controlled by a CM system. For instance, 
CVS requires administrative actions to make a particular composition. This is required 
for each composition. Finally, composition of different CM systems (for instance CVS 
with Subversion) is, to the best of our knowledge, not possible with current technology. 

Rule: Component Deployment with Builcl-level Packages. To allow more wide-spread 
use of build-level components, they should be deployable independently of a CM system. 
To that end, release management [5] is needed to make components available without 
CM system access. Release management should include a version scheme that relates 
component releases to CM revisions. In the remainder of this paper we call such a 
versioned release of a build-level component a build-level package (or package for 
short). 

Implementation with Autotools. The combination of Autoconf and Automake already 
provides support for software versioning and for generating versioned software releases. 
Each build-level component is given a name and a version number in the Autoconf 
configure script. Automake provides the build action dist to make a versioned release. 
The distcheck action serves to validate a distribution (e.g., to check that no files are 
missing in the distribution). 

3.9 Component Composition 

Pitfall: Making a Composition by Hand. Most software systems are manual composi- 
tions of directories, files, build processes, and configuration processes. Unfortunately, 
it is difficult to define a configuration process for a composite system (the complexity 
of configuration processes of several existing software systems demonstrate that it is no 
sinecure 2 ). It is also difficult to correctly determine all software dependencies, and to 
define a composite build process. Finally, build and configuration processes are often 

2 For instance, Graphviz contains 5,000 LOC related to build-time configuration, Mozilla more 
than 8,000. 
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package 

identification 

name=dotneato 
version=1 .0 

location=file:///home/mdejonge/graphviz/dist 
info=http://www.graphviz.org 
description=’Graphviz dotneato package’ 
keywords=graph, visualization, transformation 

configuration interface 

dmalloc ’use dmalloc for debugging memory use’ 
efence ’use efence for debugging memory use’ 
requires 
cdt 0.95 
gd 2.0 

graph 1.1 with optimization=nocycles 
pathplan 2.0 



Fig. 1. A package definition in PDL. 



hard to understand. These difficulties make the composition process time consuming 
and error prone. In addition, the composition process is hard to reproduce, and chang- 
ing a composition, by adding new directories or removing existing ones, is costly. This 
situation gets worse when the number of components increases. 

Rule: Automated Component Composition. Since it is expected that the composition 
process needs to be repeated (because compositions are subject to change), a need ex- 
ists to keep the composition effort to a minimum. Automated component composition 
is therefore a prerequisite to achieve effective CBSE practice at the build level. Auto- 
mated composition makes it easy to reuse components over and over again in different 
compositions and to manage the evolution of existing compositions over time. 



4 Automated Build-Level Composition 

Building and configuring components, which are developed according to the rules of 
Sect. 3, can be performed solely via build and configuration interfaces. This property 
allows for automated build-level composition. 

To enable automated build-level composition, we developed the package definition 
language (PDL) to formalize component- specific information [8]. A package definition 
serves to capture component identification information, to define variability parameters 
in a configuration interface, and to define dependency parameters. An example package 
definition is depicted in Fig. 1 . 

Build-level composition is based on component releases (packages). Hence pack- 
age dependencies are expressed as name/version tuples and package locations (defining 
where packages can be retrieved from) are expressed as URLs. Package dependencies 
may contain parameter bindings. For instance, the package definition in Fig. 1, binds the 
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1 . Source tree analysis 

- Find components 

- Find component references 

- Fine-tune 

2. Source tree transformation 

- Create components 

- Create package definitions 

- Fine-tune 

3. Online package base creation 



Fig. 2. The three phases of source tree decoupling. 



parameter optimization to nocycles. Package definitions are stored in package reposito- 
ries. 

The information stored in package definitions is sufficient to automate the compo- 
sition process. This process is called Source Tree Composition [8] and consists of i) 
resolving package dependencies; ii) retrieving and unpacking packages; iii) merging the 
build processes of all components; iv) merging the configuration processes of all com- 
ponents. The result of source tree composition is a directory hierarchy containing the 
build-level components (according to a transitive closure of package dependencies), and 
a top-level build and configuration process. Typical deployment tasks, such as building, 
installing, and distributing can be performed for the composition as a whole, rather than 
for each constituent component separately. Hence, the composite software system can 
be managed as a single unit. 

Package repositories can be put online in the form of Online Package Bases. 3 An 
online package base serves as component repository from where people can select build- 
level components of interest. Then, by simply pressing a button, a composite software 
system is automatically produced from the selected components. 

We have implemented automated source tree composition in the tool set Autobundle. 4 
In [9] we discuss how source tree composition can be used to integrate component 
development and deployment. This improves software reuse practice and provides an 
efficient development process for build-level CBSE. 



5 Migration to Build-Level Components 

The development rules for component development of Sect. 3 and the composition 
technique presented in Sect. 4, bring CBSE principles to the build level. Together they 
allow development of software in separate reusable components and their composition 
in multiple software systems. Although build-level CBSE seems promising, adapting 
existing software forms a barrier that stands in the way of adopting the techniques 
presented thus far. The question that comes into mind is: can’t we reengineer existing 
software systems into build-level components automatically? 



3 http : / /program-transformation. org/package-base 

4 http : / /www . cs .uu.nl/~mdejonge/software 
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To that end, we present a semi-automatic technique for applying the development 
rules of Sect. 3 to existing software. Fig. 2 depicts the three-phase process for decoupling 
source trees into build-level components. This reengineering process analyses the struc- 
ture of a source tree to determine candidate components, and Makefiles to determine 
component references. This information is used to split the source tree into pieces, and 
to generate component-specific Makefiles and configure scripts. Below we discuss the 
process in more detail. 

5.1 Source Tree Analysis 

We assume that source code is structured in subdirectories. A root directory only contains 
non-code artifacts (including build knowledge). If all sources were contained in a single 
directory, then some additional clustering techniques can be used to group related files 
in directories. 

Finding Components. The structure of a source tree in directories determines the set 
of build-level components. Consider Fig. 3, where nodes denote directories, edges di- 
rectory structure, and arrows directory references. Basically, there are two approaches: 
i) each non-root directory constitutes a separate component (i.e., a, b, c, d, and e form 
components); ii) each directory hierarchy below root, without external references to its 
subdirectories constitutes a component (i.e., abc, d, and e). In the first approach, each 
directory is a candidate for potential reuse. This leads to fine-grained reuse but also to 
a large number of components. In the second approach, actual reuse information serves 
to determine what the candidates for reuse are. Since nodes b and c in Fig. 3 are not 
referenced outside the tree rooted at a, both are considered not reusable. This reduces 
the number of components, but also results in more coarse-grained reuse. In this paper 
we follow the first approach. 

Finding References. Directory references serve to determine component dependencies. 
That is, if a directory reference from a to & exists and a and b become separate compo- 
nents, then b becomes a dependency of a. 

Directory references are found by inspecting the Automake Makefiles in the source 
tree for directory patterns. For each directory reference found it is checked that it points 
to a directory inside the source tree and that the target directory contains an Automake 
Makefile. Thus, references outside the source tree and references to directories that are 
not part of the build process are discarded. 

Fine Tuning. From the information that is gathered thus far, we can construct a compo- 
nent dependency graph that models components and their relations. This model serves 
as input for the transformation phase discussed below. Fine tuning consists of modifying 
the graph to specific needs and to repair some problems: 

- Additional edges and arrows can be added to the graph, in case the analysis failed 
to find them all automatically. 

- The component dependency graph needs to be adapted in case of cyclic dependen- 
cies. These are not automatically repaired because changing a cycle into a tree and 
selecting a root node cannot be done unambiguously. 
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Fig. 3. A directory hierarchy with directory references represented by arrows. 



- The graph can be adapted to combine certain nodes to represent single, rather than 

separate components. 

We use dot [4] to represent component dependency graphs. The adaptations are specified 
as graph transformations, which can be performed automatically. The complete analysis 
phase then becomes an automated process that can be repeated when needed. 

5.2 Source Tree Transformation 

The source tree transformation phase consists of splitting-up a source tree into build-level 
components, and creating package definitions for each of them. This process is driven 
by the information contained in the component dependency graph constructed during 
the first phase. In the discussion below, we assume that it contains three components, 
capturing the directories abc, d, and e of Fig. 3. 

Creating Components. Creating a build-level component, involves: i) isolating its im- 
plementation from the source tree; ii) creating an Autoconf configure script; iii) creating 
an Automake Makefile. 

To isolate the implementation of a build-level component c from a source tree s, 
the subtree containing its implementation is moved outside s. If the subtree of c has 
subdirectories, which, according to the component dependency graph, belong to other 
components, then these subdirectories are recursively moved outside c. For instance, 
in case of Fig. 3, the subtrees rooted at nodes a and d are placed outside the source 
tree. Component d has a subdirectory e that forms a separate component and is therefore 
moved outside d. The subdirectories of a are not moved because they do not form separate 
components. 

Component-specific Autoconf configure scripts are created from the top-level con- 
figure script. The following adaptations are made: i) The name and version of the original 
system are replaced by the name and version of the component; ii) References to other 
directories are removed. Thus, all files listed in AC_CONFIG_FILES that do not belong to 
the component are removed; iii) Configuration switches are added for each component 
dependency. For component a of Fig. 3 this means that a configuration switch for com- 
ponent e is created. The binding of this switch is accessible in Makefiles as ${E}. The 
resulting Autoconf configure script is tailored for a single component: it instantiates only 
Makefiles of the component and it does not contain hard references to other components. 
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The most complex task is creating Automake Makefiles. First, this involves removing 
directory names for those directories that have become separate components. In particu- 
lar, this means that these names are removed from the SUBDIRS variable. If this variable 
becomes empty, the variable itself is removed. Second, self-references are changed. In 
the original tree, the component e from Fig. 3 might reference itself in different ways, 
e.g., as ${top_srcdir}/d/e, or . . /e. These have to be changed according to the new 
directory structure, e.g., in ${top_srcdir}/e. Third, reusable files should be made 
public accessible in standard locations. The original source tree may contain direct file 
references but these are no longer allowed. For instance, in the original source tree (see 
Fig. 3) one can depend on the exact directory structure and access a C header file f . h 
in directory b as ${top_srcdir}/a/b/f ,h However, in the new situation this is not 
allowed because a component can only be accessed via its interfaces and one cannot 
depend on its internal structuring. This implies that for a file to be accessible, it needs to 
be placed in a standard location. We accomplish this by replacing no inst .HEADERS and 
include_HEADERS variables by pkginclude .HEADERS. This guarantees that the header 
file f . h always gets installed in include/a/ relative to some compile-time configurable 
directory. Other files, such as libraries, are made accessible in a similar fashion. Fourth, 
directory references are changed into component references. This implies that all file 
referencing goes via interfaces. For example, the file f .h that belongs to component a, 
can then be accessed as ${A}/include/a/f . h. The variable A is bound at composition 
time. The result is that external references into a component’s source tree no longer exist. 
The component can therefore safely change its internal structure when needed. 



Creating Package Definitions. The component dependencies of a component are cap- 
tured in an automatically-generated package definition. This package definition also 
contains a standard identification section, containing the name and version of the pack- 
age, and the location from where it can be retrieved. In addition, a configuration interface 
section is constructed by collecting all configuration switches from the Autoconf con- 
figure script. 



Fine Tuning. A build-level component that is the result of the procedure above, has a 
component-specific configuration and build process. Component dependency parameters 
can be bound with configuration switches. Now is the time to fine-tune the component 
to repair problems resulting from its isolated structure: 

- Because circular dependencies are no longer allowed, the implementation of com- 
ponents having circular dependencies needs to be fixed. This involves restructuring 
files or creating new components as discussed in Sect. 3. 

- The automatic source tree transformation might fail in discovering and changing all 
directory and file references. These can now be repaired manually. 

- Software systems driven by Automake and Autoconf do not always produce com- 
plete distributions. This means that a distribution does not include all files that are 
referenced by its Makefiles. Build-level components inherit these errors. To make 
them suitable for composition, these errors must be repaired, either by removing 
them from the Makefiles, or by adding them to the EXTRA DIST variable. 
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Fig. 4. Subset of Graphviz’s directories and directory references. 



The modifications can be defined as patches, such that they can be processed auto- 
matically. This yields a fully automated transformation process. After making sure that 
these patches yield a component for which the dist check build action succeeds, the 
component is ready and can be imported in a CM system for further development. 

5.3 Package Base Creation 

The last phase in source tree decoupling, is to make the components available for use. 
This implies that component distributions are created and released, and that an online 
package base is generated from the package definitions. Component releases are stored 
at the location specified in the generated package definitions. The online package base 
is driven by Autobundle. It offers a web form from which component selections can 
easily be assembled by pressing a single button. This makes the functionality that was 
first entangled in a single source tree, separately reusable. 

6 Graph viz: A Case Study 

In this section we discuss the application of source tree decoupling to a real-world 
application. 

6.1 Graphviz 

Graphviz 5 , developed by AT&T, is an Open Source collection of tools for manipulating 
graph structures and generating graph layouts [4]. It consists of many small utilities 
that operate on the dot graph format. Its structuring in many small utilities makes it 

5 http : / /www . graphviz . org 
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of general use for all kinds of graph visualization and manipulation problems. This is 
affirmed by its adoption in a large number of software systems. 

Thus, the functionality offered by Graphviz turns out to be effectively reusable. 
However, for many uses of Graphviz only a small subset of the tool set is actually needed 
(e.g., only the program dot might be needed for visualizing graphs). Graphviz does not 
support reuse of this granularity. This has two important drawbacks for software systems 
that are using Graphviz: i) software distributions and installations become unnecessarily 
large and complex; ii) it introduces several dependencies on external software, which 
are in fact not used. 

At the build level we can make some other observations. The Graphviz distribution 6 
contains 264 directories, 1,856 files, and 174 directory references. Graphviz is imple- 
mented in multiple programming languages, including C, C++, Tcl/Tk, AWK, and shell 
scripts. The Graphviz implementation consists of more than 300,000 lines of code. In 
Fig. 4 we depict a small portion of the build-level structure of Graphviz, containing 
directories (as nodes) and directory references (as arrows). Boxes correspond to root 
nodes (i.e., directories to which no references exist). From this picture we can make 
two observations: i) the many directory references reveal that there is much reuse at the 
build level (each directory reference corresponds to a reuse relation from one directory 
to another). Despite their reusability, they are not available for reuse outside Graphviz’s 
source tree; ii) some arrows are pointing in two directions (i.e., the dashed arrows), 
indicating circular dependencies between directories. As we pointed out in Sect. 3, this 
forms an indication for problems in the structure of Graphviz. In addition, the config- 
uration process of Graphviz is quite complicated (i.e., more than 5,000 lines of code 
related to build-time configuration of Graphviz). It is therefore hard to extract from the 
Graphviz source tree just what is needed, and integration of Graphviz with other software 
is painful. 



6.2 Restructuring Graphviz 

Due to the aforementioned problems (i.e., Graphviz is is too large, it has too many 
external dependencies, its configuration process is too complex, it has cyclic depen- 
dencies, and it contains reusable functionality that is not available for external reuse), 
Graphviz forms a perfect candidate for applying our semi-automatic restructuring tech- 
nique. Below follows a discussion of the different steps that we performed to restructure 
Graphviz. 



Fixing Circular Dependencies. Because of circular dependencies between directories, 
we first had to remove the corresponding cycles from the component dependency graph. 
We defined this adaption as a simple graph transformation. At the end of the source tree 
transformation phase, we removed circular references from the generated build-level 
components as well. This had little impact, because they were either unnecessary and 
could simply be removed, or they could be solved by moving some files. 

6 Graphviz version 1.10 
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Restructuring. The component structure produced at the first migration phase was not 
completely satisfactory. Some components were too fine-grained and needed to be com- 
bined with others. Therefore we removed some of the nodes and edges from the com- 
ponent dependency graph by means of an automatic graph transformation. In some 
cases we had to move files between components, because they were accessed from one 
component but contained in another. 

Repairing Makefiles. Graphviz is not prepared for rebuilding distributions. The problem 
is that the Makefiles contain references to files that are contained in Graphviz’s CM 
system but not in Graphviz distributions. Consequently, building a distribution from a 
distribution fails because of missing files. Since build-level composition is based on 
packages, which are independent of a CM system by definition (see Sect. 3), the build- 
level components of Graphviz had to be repaired. This involved adapting the Makefiles 
of components such that all files referenced are also distributed. 

6.3 Graphviz Components 

The restructuring process yielded 47 build-level components. We automatically created 
releases for them and we generated a Graphviz online package base. Finally, we generated 
a new (abstract) package definition called graphviz that is depended an all top-level 
components (i.e., corresponding to the boxes of Fig. 4). The corresponding composition 
of components is similar to the initial Graphviz source tree. This demonstrates that we can 
reconstruct the initial Graphviz distribution with build-level composition. In addition, 
we combined the Graphviz package base with additional package bases to make build- 
level compositions of Graphviz components and arbitrary other build-level components. 
This demonstrated build-level CBSE in practice. 



7 Concluding Remarks 

In this paper we argued that software reuse is hampered because the modularization 
principles of strong cohesion and weak coupling are not applied at the build level for 
structuring files in directories. Consequently, files with potential reusable functionality 
are often entangled in source trees and their build instructions hidden in monolithic 
configuration and build process definitions. The effort of isolating modules for reuse in 
other software systems usually does not outweigh the benefits of reusing the module. 
Consequently, reuse is not optimal or too coarse grained. 

Contributions. In this paper we proposed to apply component-based software engineer- 
ing (CBSE) principles to the build level, such that build-level components are accessed 
only via well-defined interfaces. We analyzed bad programming style, practiced in many 
software systems, that breaks CBSE principles. We defined rules for developing “good” 
build-level components. We discussed an automated composition technique for build- 
level components. In order to make our techniques feasible, we defined a semi-automatic 
process for source tree decoupling. It aims at easily migrating existing software to sets 
of build-level components. This process consists of a source tree analysis and a source 
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tree transformation phase, where build-level components are identified and isolated to 
form individual reusable components. We demonstrated our techniques by decoupling 
Graphviz into 47 components. 

Discussion. The most prominent shortcoming of our approach is the dependency on 
Autoconf and Automake. However, since these tools are so often used in practice, and 
because many systems are migrating to adopt them (Graphviz is a good example of this), 
we believe that this dependence is acceptable. 

Currently, we are not able to precisely track what the configuration switches and 
environment checks of a component are. Consequently, the per-component generated 
configure scripts need some manual adaption to remove stuff that does not belong to the 
component. Observe however, that only information is removed; the generated compo- 
nents will therefore work with or without this extra information. 

Graphviz was not a toy application to test our techniques with. Since it has migrated 
from a build system without Automake, its build and configuration processes contain 
several inconsistencies, as well as constructs that break Automake principles. The suc- 
cessful migration of Graphviz therefore strengthens our confidence in the feasibility of 
our techniques. Nevertheless, we look forward to apply our techniques to additional 
software systems. 

Related work. Koala is a software component model that addresses source code composi- 
tion [15]. Unlike our approach. Koala is concerned only with C source code. Component 
composition therefore involves composing individual C source modules and defining a 
sequence of compiler calls. Because it is tailored towards a single programming language, 
Koala has more control over the composition process at the price of less genericity. For 
example, adopting non-Koala components is therefore difficult. 

Reengineering build and configuration processes, and decoupling source trees into 
components is a research topic that is not so well addressed. Holt et al. emphasize that the 
comprehension process for a larger software system should mimic the system’s build 
process [6], Their main concern is understanding the different pre-compile, compile, 
and link steps that are involved in a build process, not restructuring source code, or 
making build and configuration processes compositional. In [18], the notion of build- 
time architectural views is explored. They model build-time relations between subparts 
of complex software systems. They do not consider the structuring of files in directories 
and splitting up complex software systems in individual reusable parts. 

There exist several clustering techniques that help to capture the structure of existing 
software systems [11]. In [2], a method is described for finding good clusterings of 
software systems. Such clusters correspond to use-relations (such as calling a method, or 
including a C header file). A cluster therefore not always corresponds to a component. In 
our approach we use the directory structure for clustering source code into components. 

It is sometimes argued that build knowledge should not be spread across directories 
at all, but contained in a single Makefile [14], The motivation is that only in a single 
Makefile, completeness of build dependencies can be achieved. This is merely due 
to limitations of traditional make implementations. Unless such global Makefiles are 
generated, they completely ignore modularization principles necessary for decomposing 
directory structures. 
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Abstract. A reuse repository is an essential element in component-based soft- 
ware development (CBSD). Querying-based retrieval and browsing-based re- 
trieval are two main retrieval mechanisms provided in real world reuse reposito- 
ries, especially web-based repositories. Although browsing-based retrieval is 
superior to querying-based retrieval in some aspects, the tedious retrieval proc- 
ess is its main drawback. In this paper, we propose a novel approach to over- 
coming this drawback. The basic idea of our approach is to rank the attributes 
using the information entropy theory. According to our experimental results on 
real data, our approach can effectively reduce the length of the retrieval se- 
quence. In other words, this approach is helpful in accelerating the process of 
browsing-based retrieval. 



1 Introduction 

It has been widely accepted that it is essential for component-based software devel- 
opment (CBSD) to be based on reuse repositories storing a large number of compo- 
nents [2] [8]. With the prevalence of the Internet, most real world repositories are 
accessible through web-based interfaces (see e.g. ComponentSource [3], Downloa- 
d.com [5], SourceForge [10] and Open-Components [9] etc.). 

To facilitate users to efficiently retrieve components, it is typical for reuse reposi- 
tories to provide representation methods to index components in them. In the litera- 
tures, three categories of representation methods have been identified: information 
science methods, artificial intelligence methods and hypertext methods [6] [7]. Along 
with one or more representation methods, a reuse repository will also provide one or 
more retrieval mechanisms to match users’ requirements for components against the 
representation of each component in the repository. In general, there are two kinds of 
retrieval mechanisms for this matching: querying-based retrieval and browsing-based 
retrieval [11] [12] [13]. In a web-based repository, usually both mechanisms are pro- 
vided. 

In querying-based retrieval, the user expresses his/her requirements as a query. In 
the query, the requirements are organized through logic operators (i.e. “NOT”, 
“AND” and “OR”). The main advantage of this approach is that a user can use one 
query to express all his/her requirements. The main disadvantage of this approach is 
that the user should be very familiar with the representation methods used in the re- 
pository and can manage to use the same terms in the representation methods to or- 
ganize his/her query. 
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On the contrary, browsing-based retrieval can help an inexperienced user to browse 
through the representation methods during the retrieval process, because the user can 
provide his/her requirements in an interactive way while browsing. In this retrieval, 
the user can always select the links displayed on the screen without having any idea of 
the representation methods beforehand. However, there is an obvious disadvantage of 
this approach. The browsing process may be very tedious and time-consuming be- 
cause each browsing step can only provide very little additional requirements. This is 
more severe when the repository grows larger [11]. Therefore, methods that can ac- 
celerate the browsing process are essentially important for browsing-based retrieval. 

In this paper, we propose an entropy-based approach to accelerating the process of 
browsing-based component retrieval. In our approach, we use the information entropy 
to calculate the importance of each attribute used to index the components in the re- 
pository. Therefore, the user can browse the attributes according to their importance 
values and find the target components in an efficient way. 

The organization of the remainder of this paper is as follows. Section 2 describes 
the problem in a precise way. Based on the problem description in section 2, we pres- 
ent our approach in section 3. In section 4, we present the results of an experiment 
carried out on some data acquired from SourceForge. Section 5 further discusses 
some related issues and concludes this paper. 



2 The Problem of Accelerating Browsing-Based Retrieval 

In this section, we present a process model for browsing-based component retrieval, 
which can enable us to precisely define the problem of accelerating the retrieval. 
Based on this model, we illustrate this problem through a motivating example. 

2.1 A Process Model for Browsing-Based Retrieval 

The process of browsing-based component retrieval can be summarized in Fig. 1. In 
the retrieval process, the user is presented with a screen containing many links. After 
an overview of the screen, the user selects a link most interesting to him/her and 
clicks on the link. This will result in a step of retrieval from the repository. The re- 
trieved components along with some further links are displayed on the next screen. 
The user can keep on this clicking and retrieving until the set of retrieved components 
is so confined that the user can easily determine whether the requested components 
are contained in this retrieved component set. Actually, the retrieval process can be 
viewed as the process of building up a query that can retrieve the final set. Each click 
is to add more requirements into the query and further confine the retrieved set. From 
this process model, we can see that it is not necessary for the user to have any knowl- 
edge about the representation methods in the repository before the retrieving. All this 
knowledge can be built up while screening the links on each displayed page. 

We should note here that the model is a general model. It is not restricted to any 
particular representation method. In an actual reuse repository, the links displayed on 
the screen are usually specific to a particular representation method. For example, for 
the attribute-value method or the faceted method, a link might represent an attribute, a 
facet, or a value for the attribute or the facet. 
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Fig. 1. A process model for browsing-based component retrieval 



2.2 A Motivating Example 

Typically, in browsing-based retrieval, links displayed on each screen are mapped 
from the classification schemes used in the repository. However, as the hierarchical 
classification schemes adopted by most reuse repositories are very inflexible, the 
inflexibility of the schemes will usually result in the tediousness of the retrieval proc- 
ess. Therefore, it is in great need to find a approach to accelerating browsing-based 
retrieval. 

To further illustrate our motivation for this research, we present an example from 
the real world in this sub-section. In Fig. 2, there is an example based on a subset of 
components distilled from SourceForge. In this sub-repository, each component is 
represented by seven attributes, which are depicted in Fig. 2(a). Therefore, the re- 
quirements for retrieving a target component can be represented as a set of values, 
each of which is for an attribute among the seven, which are depicted in Fig. 2(b). 

If the user is familiar with the terms used for attributes and values in the repository, 
he/she can form the query, which is depicted in Fig. 2(b). If this query is used in que- 
rying-based retrieval, the target component can be directly retrieved. However, if the 
user is unfamiliar with the terms used in the repository and has to browse through the 
attributes and values, usually a long retrieval sequence is needed. Fig. 2(c) illustrates 
three different retrieval sequences representing different strategies in the browsing. 

From Fig. 2(c), it can be easily discovered that different browsing strategy can 
have different effectiveness in achieving the final component set. If the threshold 
number for determining whether the retrieved set is confined enough is set to 5, five 
retrieving steps are needed to get the confined set for the first retrieval sequence, 
while the numbers of steps for sequence 2 and sequence 3 are two and four respec- 
tively. Obviously, the second sequence is the best among the three. Actually, this 
difference in effectiveness can also be found for retrieval sequences for most other 
components. Therefore, if the repository can provide a good guidance for the user to 
select the most effective retrieval sequence, the retrieval process can be accelerated. 
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Fig. 2. A motivating example 



3 The Attribute Ranking Approach 

From the motivating example, we can see that different attributes can have different 
capability in confining the retrieved component set. In this paper, we refer to this 
capability as the importance. Thus, if more important attributes can be used earlier in 
the retrieval process, the retrieved component set can be confined to be smaller than 
the threshold more quickly. Therefore, the problem of how to accelerate browsing- 
based retrieval can be turned into the problem of how to correctly rank the attributes 
according to their importance. Of course, the idea of ranking attributes is primarily 
applicable to the attribute-value representation method. As the faceted method can be 
viewed as a variant of the attribute-value method, this idea can be easily adapted to 
the faceted method. 

In order to evaluate the importance of every attribute, we will employ the informa- 
tion entropy theory for the importance evaluation. In the following of this section, we 
make a brief introduction of the information entropy theory and then present our 
method of evaluating the importance of each attribute and calculating the attribute 
ranking. 



3.1 The Information Entropy Theory 

The information entropy theory can be traced to 1940’s. In the past decade, it is 
mainly studied in the field of data mining and knowledge discovery. The so-called 
information entropy [1] is a measure of how much information the answers for a spe- 
cific question can provide. For a given question P, if there are n possible answers for 
the question, and the probability of the occurrence of the ;th answer is p r the informa- 
tion entropy of P is defined in (1). For convenience, we denote the above situation as 

P=(Pp Pi’ ■■■ P,J- 



E { p )=-'Zp, i °g 1 {p , ) (i) 

i=l 

According to the information entropy theory, the larger the entropy of a question 
is, the more information an answer of the question may provide. 
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3.2 Evaluating the Importance of an Attribute 

Supposing there are s components, let us consider the entropy of an attribute that has 
n distinct values. Supposing the ith (i=l, 2, ...n) value can represent c t components, 

we have = 5 . Assuming the probability of each component being the requested 

component is the same, if we view the attribute as a question and the n values as n 
answers for the question, the entropy of the attribute (denoted as A=(c r c 2 , ...c n )) can 
be calculated using the formula (2). 



With some further calculation, formula (2) can be transformed into formula (3). 



From formula (3), we know that if an attribute has more values and the distribution 
of the components under these values is less skewed, the more important the attribute 
is. When there is no component under the ith value (i.e. c =0), we should assign the 

value 0 to — log 2 c, in formula (3), otherwise formula (3) will be meaningless. 



3.3 The Algorithm for Building the (Partial) Ranking Tree 

After we obtain the importance of each attribute, we can rank all the attributes ac- 
cording to their importance values. However, during each retrieving step, the re- 
trieved components are different from the components used in the last step. Therefore, 
the optimal sequence of attributes can be calculated based on the set of components 
acquired in the previous step. 

Obviously, the selection can be calculated after the component set for each step is 
determined during the retrieval process. However, if there are many users browsing 
the repository simultaneously, this strategy may decrease the performance of the re- 
pository greatly. In the following, we present a tree structure (denoted as the ranking 
tree) to store all the possible sequences of attributes. When there are too many attrib- 
utes and/or too many values for these attributes, the ranking tree may become too 
large. In such a case, we can calculate just part of the ranking tree, and when the at- 
tribute selection is out of the range of the partial ranking tree, the selection has to be 
calculated based on the currently retrieved component set. 

The ranking tree is defined as follows. Each node in the tree represents an attribute, 
and the link between a node and one of its sub-nodes represents a value of the attrib- 
ute represented by the parent node. Thus, a route from the root node to a node in the 
tree consists of a sequence of alternatively appearing nodes and links. This route is 
corresponding to a retrieval sequence. 

Supposing there are n components in the repository, we use an array R[l..n] to de- 
note all the representations of the components. Each element in R is a set of attribute- 
value pairs that represent the corresponding component. Supposing there are m attrib- 
utes, we use a set A to store all the m attributes, and an array V[l..m] to denote the 




( 2 ) 




(3) 
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values for the attributes. Each element in V is a set of values for the corresponding 
attribute. The algorithm for calculating the (partial) ranking tree is depicted in Fig. 3. 
The maximum tree depth d is used to control the size of the tree. If d is larger than m, 
the calculated ranking tree is full. In this algorithm, a recursive procedure named 
BuildSubNodes is used to select the attribute for one node and build its sub-nodes. 
The algorithm for BuildSubNodes is depicted in Fig. 4. 



Input: R, A, V and the maximum tree depth d 
Output: the (partial) ranking tree 
Step 1: Set RankingTree to be empty; 

Create an attribute node N', N.depth= 1; 

Set N as the root of RankingTree ; 

Step 2: BuildSubNodes(/?, A, V, N, RankingTree, d)\ 



Fig. 3. The algorithm for building the (partial) ranking tree 



Input: R, A, V, N, RankingTree and d 
Step 1: A’»A; 

Step 2: Find the route r from the root of RankingTree to N; 

Step 3: CurrentComponents • ; 

For each element R[i] in R 
If all the attribute-value pairs in r are in R[i] 
CurrentComponents*CurrentComponents U {i}\ 

Step 4: If CurrentComponets is empty 
Return; 

Step 5: Find the most important attribute a m A’ using CurrentComponents ; 
A’*A'-{a}\ 

N. attribute* a\ 

Step 6: If A is empty 

Return; 

Step 7: If N. depth equals d 

Return; 

Step 8: Find the set of values for a in V, denoted as values', 

For each value v in values 

Begin 

Create an attribute node AT; N’.depth= N. depth + 1; 

Link N’ to N as a sub-node of N; 

Label the link with v; 

BuildSubNodes(/?, A ’, V, N\ RankingTree, d)\ 

End 



Fig. 4. The algorithm for BuildSubNodes 



With the ranking tree, the user should always select the root attribute in the tree, 
and when the user determines a value for the attribute, he/she goes to the correspond- 
ing sub-node. Therefore, the process of retrieval turns to follow a route in the ranking 
tree. 



4 An Experimental Study 



To evaluate our approach, we performed an experimental study on some real data 
acquired from SourceForge [10], In this section, we report the experimental study in 
detail. 
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4.1 The Experiment 

SourceForge is a web-based reuse repository, which is aiming at helping software 
developers with software packages that can be integrated into new software. As 
SourceForge is a very large commercial reuse repository, it is not feasible for us to 
perform the entire experimental study on SourceForge. Therefore, we select 306 com- 
ponents in SourceForge, and use all the attributes and values related to these compo- 
nents in SourceForge to set up an experimental repository. 

As in SourceForge, each component in the experimental repository is labeled by 
eight attributes: "Development Status", "Environment", "Intended Audience", "Li- 
cense", "Natural Language", "Operating System", "Programming Language" and 
"Topic". Based on the components and the attributes in the experimental repository, 
we build up the ranking tree using the algorithm depicted in Fig. 3. 

4.2 The Calculated Ranking Tree 

The ranking tree calculated in our experimental study is depicted in Fig. 5. Due to the 
space of paper, some branches of the ranking tree are omitted and replaced by an 
ellipsis. Each node in the ranking tree is labeled by a component attribute name; each 
branch of this node is labeled by a corresponding value of this attribute. So every path 
from the root node to the leaf represents a retrieval sequence. 




Fig. 5. The ranking tree calculated in our experimental study 

Based on this ranking tree, we build a prototype navigator, which can always select 
the most important attribute in each retrieval step. In other words, the navigator can 
always give the results of our approach. 



4.3 Results and Analysis 

To evaluate the performance of our approach, we compare it to the random attribute 
selection strategy, which always randomly selects the next attribute in each retrieval 



Attribute Ranking: An Entropy-Based Approach 239 



step. We use all the components in the experimental repository to test the perform- 
ance of our approach and that of the random attribute selection strategy. When re- 
trieving one of these components, our approach is used one time while 100 random 
retrieval sequences are used as comparison. For retrieving each component, the num- 
ber of retrieval steps for our approach and the average number of retrieval steps of the 
100 random sequences are recorded. Based on the performance data of each compo- 
nent, we can compare the overall performance of two approaches. In this experiment, 
once the number of retrieved components is under the threshold 5, we will regard the 
retrieval is completed. 

Below, we present the results of the experimental study and the corresponding analy- 
sis from two angles. Firstly, we record the number of retrieval steps required for re- 
trieving each component. This result will be analyzed to demonstrate the ability for 
our approach to shortening the retrieval sequence. Secondly, we present a comparison 
of the average numbers of resulted components in each retrieval step for the two ap- 
proaches. This can demonstrate the ability for our approach to confining the resulting 
component sets in all the retrieval steps. 

The Numbers of Retrieval Steps 

In Fig 6, we present two lines, which show relationships between the number of re- 
trieval steps and the number of components whose retrieval can be completed under 
that number of retrieval steps. Line A is corresponding to our retrieval strategy, while 
line B is drawn according to the average result of 100 random retrieval sequences. 




Number of Retrieval Steps 



Fig. 6. Relationship between the number of retrieval steps and the number of components 
whose retrieval can be completed under that number of retrieval steps 

As we can see in line A, under our retrieval strategy, 14 components can be re- 
trieved in one step, 212 components can be retrieved in two steps, and all the remain- 
der 80 components can be retrieved in three steps. That is to say, all the components 
in our experimental repository can be retrieved within three steps, and about 73.86% 
components can be retrieved within two steps. In line B, 12.55 components can be 
retrieved in one step, 90.1 components can be retrieved in two steps, 116.65 compo- 
nents can be retrieved in three steps. That is to say, all the remainder 86.7 components 
need more than three retrieval steps, and only 33.55% components can be retrieved 
within two steps. The result shows that our approach can reduce the number of re- 
trieval steps for most components in the experimental repository effectively. 
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The Average Numbers of Components in Each Retrieval Step 

In Fig 7 we also present two lines which show the relationships between the number 
of retrieval steps and the average number of components resulted in each retrieval 
step. Line A shows the changes of the average number of retrieved components in 
each step for retrieving all the 306 components in our experimental repository under 
our retrieving strategy. Line B shows the average results for the 100 random retrieval 
sequences. 

As we can see in line B, the average number of retrieved components after the first 
retrieval step is 78.95, and this number is reduced to 19.06 after the second retrieval 
step, 5.22 after the third step, and 1.43 after the forth step. In line A, t the average 
number of retrieved components after the first retrieval step is 24.04, only 30.45% of 
the figure for line B, 3.53 after the second retrieval step, only 18.52% of that for line 
B, and 0.61 after the third step, only 11.69% of that for line B. This result shows that, 
in the retrieval under our strategy, the range of retrieved components in each retrieval 
step is much more confined than that in each retrieval step under the random retrieval 
strategy. 




1 2 3 4 5 6 7 

Step Sequence Number 



Fig. 7. Comparison of the number of retrieved components in each retrieval step 

These experiments show that our approach can confine the retrieved components in 
each retrieval step effectively, and our experiment also shows that it is also effective 
in shortening the component retrieval sequences. As we have discussed, for browsing- 
based component retrieval, if the resulted components in each retrieval progress is 
reduced and also the retrieval steps is reduced, then the whole retrieving progress will 
be accelerated. So, based on the results of all the experiments in this section, we can 
conclude that our entropy-based attribute ranking approach can accelerate the brows- 
ing-based component retrieval effectively. 



5 Discussion and Conclusions 

As mentioned above, retrieving components in a reuse repository is actually matching 
the user’s requirements against the component representations. For browsing-based 
retrieval, as the retrieval process is usually very tedious due to the inflexible hierar- 
chical classification schemes, it is necessary to accelerate the retrieval. Due to the 
nature of the retrieval, there may be two ways to optimize the retrieval: optimization 
from the user’s angle and optimization from the repository’s angle. Previously, this 
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optimization problem is mainly addressed from the user’s angle (see e.g. [4] and 
[13]). The approach in [4] uses history knowledge to facilitate users to retrieve com- 
ponents, while the approach in [13] uses domain knowledge. However, as the reposi- 
tory can easily acquire the total knowledge of all its components, an approach that can 
provide an optimized guidance of retrieval for users from the repository’s angle 
should be cheap, effective and thus desirable. 

In this paper, we have proposed an approach to accelerating browsing-based com- 
ponent retrieval from the repository’s angle. The basis of our approach is the infor- 
mation entropy theory. According to our experimental results, our approach can ef- 
fectively accelerate the retrieval process. We think our work is just the beginning of 
research on retrieval optimization from the repository’s angle, and we would like to 
see more works on this issue. 

In the future, we plan to apply more data mining techniques to acquire useful 
knowledge from reuse repositories, including statistics knowledge and semantic 
knowledge, which may further help users retrieve components. Furthermore, we are 
also looking at the possibility of combining the above two angles to achieve better 
results in retrieval optimization. 
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Abstract. We describe reuse as a process of matching needs with product de- 
scriptions. The challenges of reuse stem from the fact that neither needs nor 
products are ever fully described. The process of uncovering the true meaning 
of both the need and the candidate product is, we argue, a process of ontology 
negotiation, that is, of mapping the terms of one ontology to those of another 
through a series of question and answers. Without such a negotiation process, 
ontologies themselves provide little value for reuse beyond existing practice in 
domain modeling. We describe the ontology negotiation process as it occurs in 
a space-system application in which end-users “program” new functions by 
composing reusable components. 



1 Reuse as a Knowledge-Based Transaction 

Reuse may be viewed as a transaction involving a provider and a consumer. The con- 
sumer has a need, the provider has a product (perhaps more than one), and the con- 
sumer must decide whether the provider's product meets her need. If so, the transac- 
tion takes place; if not, the consumer looks elsewhere, or makes a decision not to 
reuse. 

Viewing reuse as a transaction highlights the crucial role of a product's description. 
But not only of the product's description: also the articulation of the need. The con- 
sumer reasons with these descriptions in order to determine whether the product 
meets the need. Formally, if the need is articulated in an assertion R(x) where x repre- 
sents the required product, and the product under consideration is described by an as- 
sertion P(x), then the consumer must determine whether P(x) R(x), where the ar- 
row indicates implication, i.e., P(x) implies R(x). 

One obvious challenge in reuse is to prove this implication, or to prove that it is 
false. If P and R are complex assertions in first-order logic (let alone higher-order 
logic), this can be a non-trivial challenge. It is, however, the least of the challenges 
posed by the situation. 

A far greater challenge is posed by the fact that both P and R are usually incom- 
plete descriptions. Even if the candidate product has been specified in a formal lan- 
guage, and implemented through a rigorous (for example, machine-checked) process 
of transforming the specification into executable code, the chances are overwhelming 
that it is still incomplete. It may completely characterize the functional properties of 
the product, but function is not the only dimension that must be considered in devel- 
oping a software system. What about performance? Size? Conformance to standards? 
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Use of lower-level services (such as those provided by an operating system), which 
might not be formally described? 

The situation with the R, the specification of need, is even worse. Needs are always 
articulated in the context of certain assumptions about the development environment, 
the intended operating environment, the stakeholders involved, the resources avail- 
able, and other factors. Requirements analysis attempts to make such assumptions ex- 
plicit, but they can never all be explicit, or they would not be actionable or even com- 
prehensible. In addition, needs may change in the face of unforeseen events. For 
example, response-time requirements may become more stringent because a competi- 
tive product has appeared on the market. 

It is the incompleteness of P and R that makes reuse difficult. While proving P(x) 
—> R(x) (or, at least, convincing oneself of its truth) is necessary, it is not sufficient to 
ensure successful reuse. The consumer must really try to determine whether P'(x) 
R'(x) , where P' and R' are the idealized complete descriptions of the candidate prod- 
uct and the needs. P' and R' are “idealized” in the sense that they have not been ex- 
plicitly articulated. For example, R' must include (as negated assertions) all conditions 
that must be prevented. 

One way to move closer to P' and R' is to ask questions about them. For example: 
what would happen if the candidate product is used in combination with a certain 
other component? What experiences have previous users of the product had? Would 
any product satisfying R be acceptable? Can one think of a scenario in which it would 
not be acceptable? 

Such questions help to clarify the risks involved in reusing the candidate product. 
Answers to the questions might help to mitigate some of the risks by exposing infor- 
mation that might not be immediately available in the product description or in the 
description of need. 

Ultimately, the questions are an attempt to establish the real meaning of R' and P ', 
i.e., of the need and of the product. Seen in this light, the decision whether or not to 
reuse a certain product becomes a question of ontology: What, really, is the need? 
What, really, is this candidate product? 



2 Ontology and Reuse 

Ontology is the study of “being,” the study of what is, of what exists. Historically, this 
has been an area of philosophy. However, over the past decade or two, ontology has 
been adopted as a field of investigation in artificial intelligence (AI). One might call 
the resulting field artificial ontology since, as studied in AI, the goal is to develop 
models of what exists. Ontology in AI is less interested in what exists in the world at 
large than in what exists and is important to model in a given application domain. 
“An” ontology is, in effect, a model of the domain (Fensel, 2001). 



2.1 Uses of Ontology 

Artificial ontology, then, is closely related to domain modeling, and in principle the 
two terms could be considered synonymous. This immediately suggests a connection 
to software reuse. Although domains are modeled for other purposes as well, one of 
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the purposes is to support software reuse (Deridder, 2002; Finni et al, 1997; Kyaruzi 
& Katwijk, 1999; Menzies. 2002; Pernici et al, 2000; Savolainen, 2003; Souza et al, 
2001). A domain model can inform the development of a domain-specific software 
architecture and, eventually, a software component base that addresses current and 
expected needs in the given domain. (Falbo et al, 2002, 2002a; Musen, 1998). 

Like domain models, however, artificial ontology can serve other purposes too. 
Perhaps the most widely discussed purpose, these days, is to support a form of data- 
driven programming in-the-large, in particular, applications on the World-Wide Web. 
This movement towards the semantic web involves articulating the types and proper- 
ties of objects that are included in web pages so that web applications can recognize 
and process them (Hendler, 2001). For example, through the use of XSL 1 transforma- 
tions, a web page may be treated as a data source whose contents may be processed in 
non-trivial ways, not only for the purpose of displaying them in a browser but also to 
compute a needed value, to pass the results to another application, and in general to 
perform the kinds of functions that we usually refer to as Information Processing 
(Fitzgerald, 2003). With the semantic web, the richness of machine-processable web 
object descriptions will, it is hoped, increase dramatically over current XML and 
XSLT capabilities. 

A broader use of artificial ontology is to support a more open approach to object- 
oriented programming. Applications on the semantic web may be considered one 
form such programming can take, but there are others. An artificial ontology provides 
a means for a class hierarchy to be exposed to applications developed in different 
frameworks (Musen, 1998). 



2.2 Implicit Ontology in Web-Based Component Libraries 

This is not the only way to view the semantic web. One could argue that artificial on- 
tologies are already present in the current, non-semantic web, although not highly 
formalized. Every time one navigates through a classification hierarchy of web links, 
one is at least implicitly traversing an ontology. In particular, implicit artificial on- 
tologies are present in web sites providing software application or component cata- 
logues, such as that shown in Figure 1 . 

Web-based catalogues function as taxonomies, organizing their contents into a hi- 
erarchy of categories. The meaning of each category is assumed to be understood by 
the searcher. The specialization relationship of the taxonomy provides some hint as to 
the meaning, but not much. 



2.3 Web-Based Discussion Groups and Contextual Software Knowledge 

Almost as an answer to the semantic deficiency of a web-based catalogue, another 
type of web site contains discussions of issues concerning the use of components. It is 
in such sites that we find the contextual knowledge missing from a lightweight taxon 
omy of software components. The role of technical discussion sites is precisely the 
post-search question-and-answer process described in Section 1. 



1 XSL is an embedded acronym standing for Extensible Markup Language (XML) Stylesheet 
Language. 
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Audio & Video 

MP3 Search. Burners. Plavers. DVD Tools. Video... 


Design & Photo 

Imaae Editina. Flash Tools. Diaital Photo... 


Internet 

Tools. Privacy. Pod-Ud Blockers. Chat. Browsers... 


Web Developer 

HTML Editors. Site Manaaement. PHP... 


Games 

Tod Games. Action. Strateav. Arcade. Cards... 


Software Developer 

T ools & E ditors . J ava . ActiveX .XML... 


Business 

E-mail. Soam Filters. Taxes. ADDlications... 


Utilities & Drivers 

Drivers. Antivirus. Adware Removal. Security. .. 


IS/IT 

Remote Access. Sales. Internet 0 Derations... 


Desktop Enhancements 

Screensavers. DesktOD Manaaement. Themes... 


Mobile 

Palm OS. Pocket PC. Cell Dhone. Wireless... 


Home & Education 

Calendars. Lanauaae. Music. Teachina Tools... 



Fig. 1. Web-based catalogues provide lightweight ontologies through their taxonomic organi- 
zation of components. This is the top level of the www.download.com taxonomy. 

Web-based discussion boards exist for topics other than software issues, of course. 
But their use for the discussion of technical software issues illustrates the potential 
role of ontology in structuring contextual knowledge. Figure 2, for example, shows 
one level of organization of the Sun Microsystems® Java Developers forum. This ex- 
ample clearly indicates both the strengths and the weakness of such taxonomic or- 
ganization. On the one hand, there is an enormous amount of technical information to 
be presented, and the taxonomy provides some guidance in locating the information 
one needs. On the other hand, the categories are not orthogonal, their meaning may be 
somewhat fuzzy, and in practice finding a discussion of a specific technical issue can 
be difficult and frustrating. 



3 Expressiveness and Knowledge Sharing 

The problem with the implicit ontologies described in the previous two sections is that 
they convey only minimal information. As the web evolves into the semantic web, we 
can expect the information to be enriched. To see how this will happen — and to set 
the stage for our main contention, which is that this is not sufficient for effective re- 
use — we briefly review the range of formality possible in an artificial ontology. 

At the highly informal end, concepts are simply named, perhaps with free-text de- 
scriptions meant for human but not machine consumption. A slightly greater degree of 
formality occurs when concepts are arranged in a taxonomy. We have seen that this 
level of semantics is already widespread on the web. 

A still greater degree of formality occurs when each concept is qualified by a set of 
attributes. Each attribute takes one or more values, and if the values are typed then a 
certain amount of machine processing or checking can occur. From a formal point 
ofview, an attribute may be considered as a function that maps an instance of the con- 
cept to the value of the attribute. Besides attributes, an ontology may specify rela- 
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Fig. 2. A taxonomy of technical discussion topics can be quite complex, even when focused on 
a specific technology. This is part (about two-thirds) of the top level of thejava.sun.com devel- 
oper forum site. 

tionships between concepts. These include generalization-specialization relationships 
and whole-part relationships. Parts are often represented as attributes. Attributes may 
also be used to represent other relationships, i.e., associations that are neither gener- 
alization-specialization nor whole-part relationships. 

Describing attributes and relationships, therefore, involves rules that govern admis- 
sible types. Other rules might govern number. The richer the available language for 
making assertions about instances and attribute values, the more formal the ontology 
may be. For example, the Process Specification Language is a completely formal on- 
tology in which concepts are described in first-order logic (Bock et al, 2003). 



3.1 Component-Based Software Development on the Semantic Web 

The discussion of formalism in ontology allows us to envision what a software com- 
ponent site might look like on the semantic web. Referring back to Figure 1, each 
category would be a concept defined in OWL, the Web Ontology Language (W3C, 
2004). The semantic mark-up of each concept would include attributes that provide a 
richer description of the category than the plain-text category names. The additional 
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information would help in both automated searches as well as in interactive browsing 
for a desired component. 

A similar advance can be expected in the organization of discussion boards such as 
that shown in Figure 2. Rather than guessing which of several possible categories 
contains a particular technical issue, the attribute information can help clarify the in- 
tended meaning of a category, thereby facilitating both machine and human search. 



3.2 Why This Is Not Sufficient 

Proponents of the semantic web expect it to be a major advance in the machine- 
processing of web pages, and perhaps also in focusing human navigation. However, it 
is unlike to carry software reuse to a new level of effectiveness. Component libraries 
with this level of formality already exist, albeit not necessarily on the web. Certainly 
the description of components by means of attribute values is familiar to practitioners 
of reuse. 

As for the discussion boards, richer semantics may make it easier to find the dis- 
cussion of a specific technical issue, and this would be a significant improvement 
over the state-of-practice. The process, however, remains unchanged: the search for 
contextual information occurs separately from the search for a reusable product. Ei- 
ther the developer selects a candidate product, tries to reuse it, encounters problems, 
and searches a discussion board for explanations and workarounds, or she attempts to 
evaluate the product’s suitability in advance by searching a discussion site for past 
experiences by other developers. In the latter case, the search is probably less focused 
since the developer does not necessarily know what problems to expect, and therefore 
what issues to look for in the discussion base. 

In Sections 5.1 and 6 we propose a more integrated approach, in which the mean- 
ing of categories is explored and gradually exposed through a conversational process 
of ontology negotiation. 



4 Example: Spectral Analysis Filters 

The author is part of a group of physical scientists, systems engineers, and AI practi- 
tioners who are developing a Spectral Analysis Automation (SAA) system. The goal 
of the SAA is to automate portions of the spectral analysis process currently per- 
formed by scientists. Spectral analysis is the process of interpreting remote sensing 
data from spacecraft orbiting various planetary bodies. In particular, the SAA group is 
working with data received from the NEAR spacecraft as it observed the asteroid 
Eros. 

The specific problem being addressed by the SAA system is to filter the vast quan- 
tity of information received through spacecraft observation instruments. Filtering is a 
crucial function in future asteroid (and other deep space) missions where the commu- 
nications bandwidth between the spacecraft and Earth is limited. It is especially im- 
portant for missions that will send up swarms of nano-satellites, which will have huge 
amounts of information that could be interesting to the scientific investigators. 

In the SAA, raw instrument data are filtered according to task-specific, time- 
varying goals specified by the scientific investigators. The goal might be to achieve a 
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certain amount of geographical coverage, or to focus on information pertaining to 
abundances of a particular mineral, or information pertaining to the ratio between 
abundances of two minerals. The scientific goals may compete with each other in the 
sense that a packet of information that is valuable for one goal may be useless for an- 
other. Accordingly, the SAA includes an arbitration process that takes account of the 
value of a given piece of information with respect to each current goal. 

The heart of the SAA system, then, consists of the goal-based evaluation functions 
that assign a value, or profit , to pieces of instrument data with respect to a given goal, 
and the arbitration algorithm that assigns an arbitrated profit taking into account all 
goals. The architecture of the system is shown in Figure 3. 

We want the scientific investigators to be able to specify the appropriate goal- 
based evaluation functions, as well as to specify the method of arbitrating between 
conflicting goals. We have, therefore, developed a user interface (UI) through which 
the scientist can perform a certain level of component-based programming. The UI 
provides the scientist with access to a database of Evaluation Support Components 
(ESCs), which implement recurrent functions likely to be needed in defining a goal- 
based evaluation function. 

Each ESC is described through suggestive naming, free-text descriptions, and a set 
of attributes. While the attributes fall short of semantically characterizing the func- 
tions performed by each ESC, we envision a richer, ontology-based description of 
each ESC in future versions of the SAA. Given the domain of focus, which is essen- 
tially a combination of physics and chemistry, the use of a formal ontology seems 
eminently achievable. 

The basic UI scenario consists of the scientist browsing or searching through the 
ESCs, and “programming” them through composition and through boolean and tem- 
poral combinations in order to create an evaluation function, corresponding to a par- 
ticular scientific goal. In the process of defining a new evaluation function, the scien- 
tist may create new ESCs. Even a new evaluation function may become an ESC if it is 
potentially reusable as a component of still other evaluation functions. The question, 
then, is whether the available ESC descriptions, or even those we envision provided 
through a formal ontology, will suffice to allow scientists to create the desired 
evaluation functions. 



5 Reuse as Ontology Negotiation 

The problem is that even in a formal ontology, there are different ways to express the 
same idea. A mathematical function can be decomposed in different ways. Even a 
concept as seemingly precise as “iron abundance” can mean different things depend- 
ing on what iron-containing compounds one is referring to (Bailin & Truszkowski, 
2004). In the SAA UI, the scientist may look at the description of an ESC and not be 
certain whether it really is what he needs. This is an instance of the problem with 
which we opened this paper, a problem of ontology mismatch. 
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Fig. 3. The heart of the SAA system is the arbitrated evaluation process. Science data (spectra) 
are received from spacecraft instruments and are placed into packets along with engineering 
and tracking and ranging data. Each evaluation function assigns a profit to each packet. The ar- 
biter then assigns an arbitrated profit, taking into account the relative priorities of the goals. The 
packet selector then chooses packets for download to Earth, taking into account the arbitrated 
profits as well as the available communications bandwidth. 

There are several ways to handle problems of ontology mismatch (Bailin & Trusz- 
kowski, 2002). We have, however, not seen many applications of these ideas to soft- 
ware reuse. An exception is Braga et al (2001), who have applied the idea of media- 
tors to heterogeneous component ontologies. Mediators are automated translators that 
map terms from one ontology to those of another (Wiederhold, 1992). 

In the reuse scenario, the problem with mediators, as with other methods of re- 
solving ontology mismatch, is the assumption of a fixed set of ontologies that must be 
reconciled. Typically these represent distinct but overlapping domains. In the reuse 
scenario, however, we have just one articulated ontology, that of the component base, 
and the implicit ontology in the developer’s mind. The latter is not a single set of al- 
ternative terminology; it is, rather, a fluid world-view that varies among different 
people and perhaps within a single person at different times or in different task con- 
texts. 

Ontology negotiation is a technology we have proposed in order to handle ontology 
mismatch in a more fluid manner. The idea grew out of our work on the LIBRA 
methodology, which is a set of techniques for fostering reuse-oriented knowledge 
sharing between people (Bailin et al, 2000). Ontology negotiation is an attempt to 
move the LIBRA paradigm into human-to-computer or computer-to-computer com- 
munications. 

The basic idea is to inquire about meaning through questions of the sort identified 
in Section 1 . Ontology negotiation attempts to provide a structure for such inquiries, 
defining a protocol for successive refinement of shared understanding. The core of the 
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protocol is shown in Figure 4. Each party in the negotiation can ask the other party to 
clarify some or all of a previous utterance. Alternatively, either party can interpret 
what the other party has said, and ask the other party whether the interpretation is cor- 
rect. 

Clarification of an utterance consists of replacing certain terms by others. Thus, a 
clarification takes the form, “By a I mean b” where b can be a synonym, logical defi- 
nition, specialization, slight generalization, or example of a. If this clarification is still 
not understood, the recipient of the utterance can request further clarification. 

Alternatively, the recipient can proffer an interpretation of the utterance and ask for 
confirmation that the interpretation is correct. An interpretation takes the form, “I 
think that by a you mean b. Is this true?” In response, the originator of the utterance 
can confirm the correctness of the interpretation, or offer a clarification. The origina- 
tor may also not fully understand the interpretation, in which case the originator can 
request clarification, i.e., “What do you mean by A?” The negotiation process is there- 
fore both iterative and recursive. It converges when the communicating parties be- 
lieve they understand each other sufficiently well, or as well as they are ever going to. 
More details may be found in Bailin & Truszkowski (2002). 
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Fig. 4. The core of ontology negotiation is an iterative and recursive exchange of interpretations 
and clarifications. 

Our contention, based on the argument in Section 1 , is that software reuse is essen- 
tially a process of ontology negotiation. It is a process of teasing out the meaning of 
the need and the meaning of the candidate product descriptions, and the relationship 
between the two. We can understand this in terms of the two types of web sites dis- 
cussed in Sections 2.2 and 2.3, component catalogues and technical discussion fo- 
rums. The developer finds a potentially reusable product in a catalogue, which de- 
scribes the product in greater or lesser detail. But the developer still has questions. 
She consults a discussion forum for this type of product. If she finds answers, her un- 
derstanding of the original product description is deepened. With each question, and 
with each answer found, her understanding deepens. Or, if answers are not found, she 
has a better understanding of the risks involved in reusing the product. 

The problem is that the ontology negotiation remains implicit. The discussion fo- 
rum is not responsible for the original product description. Its charter is not to clarify 
the meaning of the product description, but simply to provide an arena for asking and 
answering questions, or for reporting on experiences. The ontology of issues remains 
separate from the ontology of components. In the SAA project, we are experimenting 
with user interface approaches that would bridge this divide. We describe these ideas, 
with which we are only beginning to experiment, in the following section. 
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6 Towards a Conversational Reuse Interface 

We are looking at developing a conversational interface that supports reuse as ontol- 
ogy negotiation. The interface would allow the user first to locate a candidate compo- 
nent and then to interrogate the system about its properties. The properties we have in 
mind are not those that are explicitly represented in the domain ontology, but rather 
those that are not so represented, for the reasons discussed in Section 1 . 

In the envisioned conversational interface, the user asks questions about a candi- 
date reusable product. She may, for example, request clarification of the product’s de- 
clared attributes. The system responds to such questions according to the ontology 
negotiation protocol, drawing on available thesauri and other contextual knowledge. 

The conversational interface will guide the question-and-answer process so that 
each successive question reveals, to the component base, some ambiguity or incom- 
pleteness in the product description; and each successive answer clarifies, for the user 
and for the component base, the meaning of some aspect of the product description. 
Here is an example of what the conversational interface might look like. While this 
example suggests a natural language user interface, a more restricted interface may 
still provide the conversational flavor that we aim for: 

User: I'm looking for a filter that sends through only high quality data. 

SAA: "High quality" can mean either "good signal" or "relevant content." What do you mean? 

User: Good signal. 

SAA: TMC_NEAR-NN03 is a neural network based filter that passes through good signal. 

User: Is it tunable? 

SAA: What do you mean by "tunable"? 

User: Can it learn? 

SAA: TMC_NEAR_NN03 was trained by a back propagation algorithm. 

User: But can it be retrained as I use it? 

SAA: TMC_NEAR_NN03 can be retrained by changing the input or output data. 

User: Can it be retrained as it runs? 

SAA: I don't understand this question. 

User: Is the training software part of the runtime package? 

SAA: The training software is available. 

User: Can the network be changed once the filter is running? 

SAA: I don’t know. Let me point you to the relevant part of the documentation. 

In this example, the scientist is looking for a filter that removes low-quality data, 
by which he means data with a low signal to noise ratio. Distinguishing signal from 
noise is an inexact process; the scientist wants a component that includes a learning 
function so that the algorithm for detecting signal can be improved over time. The 
SAA component manager finds a filter, called TMC_NEAR_03, that has been 
trained — hence it is capable of learning. This is not, however, specific enough for the 
scientist, who needs to know whether the component can be retrained as it is running. 

As a result of this process, the scientist should be able to make a better-informed 
choice as to whether to reuse the product TMC_NEAR_NN. While the system has not 
definitively answered his question, its failure to do so has highlighted a potential risk 
in reusing the component. At the same time, the system can evolve its ontology so 
that the product description is clarified (and perhaps other product descriptions, as 
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well). In this case, the ontology might evolve so that a distinction is made between 
one-time trainability, the ability to be retrained offline, and the ability to be retrained 
while running. 

This is an ambitious vision, and our work towards it has only just begun. The vi- 
sion grew out of the team’s experience in developing a domain-specific language for 
the SAA. This process repeatedly revealed ambiguities and disagreements about ter- 
minology and semantics. At the very least, a system that helps to identify and high- 
light semantic disconnects will help to reduce risk in reusing components. Our hy- 
pothesis is that a combination of natural language processing and ontology 
negotiation can serve this purpose. 



7 Conclusion 

We have presented the search for reusable components as a process of matching con- 
cepts in ontologies that may not be obviously congruent. We have suggested that 
many of the challenges of reuse stem from the fact that descriptions of need and de- 
scriptions of reusable products are typically incomplete. Formal ontology can help en- 
rich the descriptions, but what is really needed is a process of teasing out the implicit 
semantics of both needs and products. We proposed that the ideas of ontology nego- 
tiation, originally developed to support information retrieval, could be applied within 
a conversation user interface to support such a process. 
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Abstract. To realize appropriate software reuse, it is necessary to seek 
software that satisfies a given requirement. However, conventional search 
techniques cannot enable prompt reuse of software because such con- 
ventional techniques target the program source code as the retrieval 
unit. In this paper, we propose a new component-extraction-based pro- 
gram search system. Our system analyses a collection of object-oriented 
(OO) programs, acquires relationships among 00 classes, and extracts 
reusable software components composed of some classes. Moreover, our 
system generates indexes composed of divided type names and comments 
for newly extracted components. Using our system, the extracted com- 
ponents can be searched by keywords, and the result set can be viewed 
by a web browser such that the user can decide whether the query result 
component matches his/her requirements. 



1 Introduction 

If programs that satisfy a new requirement can be searched from available pro- 
grams, programmers can effectively reuse these programs. There are many search 
techniques[l,2,3,4,5]; however, almost all conventional techniques target an in- 
dividual source code as a retrieval unit. When a given requirement is satisfied 
by a set of source codes, users of conventional search systems must examine the 
dependences and acquire other source codes on which the source code as a result 
of the search depends. Since such additional activities entail a high cost for users, 
many reusable parts are thought to be buried in existing source codes. 

In this paper, the target of reuse is a program source code written in object- 
oriented (00) languages, particularly Java. A search system for 00 programs 
should satisfy the following functional requirements. 

— Presents a list of programs that relate to the given requirement. 

— Provides a program that can be reused without any additional programs. 

— Presents a facade class in the acquired program, that plays the role of con- 
trolling other classes. 

To satisfy all the above-mentioned functional requirements, the search system 
should use a set of related source codes as a retrieval unit. Such search results 
must be provided in the form of a component that can be immediately executed. 



J. Bosch and C. Krueger (Eds.): ICSR 2004, LNCS 3107, pp. 254—263, 2004. 
(c) Springer- Verlag Berlin Heidelberg 2004 
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2 Component-Extraction-Based Search System 

We propose a new program search system based on a component extraction 
technique for a collection of existing Java source codes. Our system targets 
JavaBeans[6] as a component architecture. JavaBeans allows software developers 
to construct Java applications by piecing components together. 

Figure 1 shows the architecture of our system. The entire process of retriev- 
ing components by our system is composed of four steps: the static analysis, 
component extraction, source indexing, and searching components. 




Fig. 1. Architecture of the component-extraction-based search system 

2.1 Class Relation Graph (CRG) 

Definition 1 (Class Relation Graph) 

The multigraph which satisfies all the following requirements is a CRG, denoted 
as r = (V. A, E ), where V is a set of class/interface nodes, A is a set of sequential 
numbers for identification of edges, and E is a set of directed edges that are 
ordered trios of a source node, destination node, and label name. Classes in Java 
core packages (e.g., j avax . swing . *) do not become nodes in the CRG, because 
the core packages are distributed by standard Java runtime systems. 

1. V = VC AVI EC is the set of class nodes corresponding to classes. VI 
is the set of interface nodes corresponding to interfaces. All nodes are denoted 
as rectangles that have class/interface names inside when the CRG is shown in 
the form of a figure. In the following, a node (on the CRG) that corresponds to 
class or interface c is described as Node(c). In contrast, a class or interface that 
corresponds to node v on the CRG is described as IVode -1 (u). 

2. E = EE U El U ER E consists of a set of inheritance edges (EE), a set 
of instantiation edges (El), and a set of reference edges (ER). 

3. EE C VC xVUVI xVI EE is a set of inheritance edges, which indicate 
that a class inherits another class/interface, or an interface inherits another 
interface, and is denoted as — >. 

4. El C V x VC x A El is a set of instantiation edges, which indicate that 
a class/interface instantiates an object of another class with an unique label for 
identification, and is denoted as • • • >. 
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5. ER C V x V x A ER is a set of reference edges, which indicate that a 
class/interface refers to another class/interface with an unique label for identi- 
fication, and is denoted as — h Our system recognizes that a class/interface c a 
refers to another class/interface Cb, when there is a type specification of Cb for 
variable/method declarations in c Q , or there is a type specification of Cb for type 
casts in c a , or c a accesses a method /field of q ,. 

For example, Figure 2(a) shows the CRG of the Prototype pattern sample 
code[7] written in Java language. 



[XT— oTrI A inherits B. 

t>TR 1 A instantiates an object of B. 
fA~l — ->TRl A refers to B. _ 
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Fig. 2. (a) CRG 



(b) Extracted component 



Definition 2 (Reachability) 

In an arbitrary CRG, node v is inheritance-reachable (instantiation-reachable) 

from node u when there is a directed path from u to v and all edges in the path are 

* * 

inheritance edges (instantiation edges), or u — v, denoted as u — > v (u ■ ■ ■ > v). 
Similarly, node v is dependence-reachable from node u when there is a directed 
path from u to v and all edges in the path are inheritance/instantiation/reference 
edges, or u = v, denoted as « 4 t. 

In addition, we describe that iVode -1 (i>) is reachable from Node~ l {u ) when 
v is reachable from it in a corresponding CRG. 



2.2 Definition of Component 

Among various sets of 00 classes, there is a set that has no hotspots and has 
a uniformed utilization procedure. Such a set of classes can be recognized as a 
component based on a particular component architecture. Our system targets 
JavaBeans as a component architecture. 

Definition 3 (Reusable Component) 

A reusable component is a set of Java classes/interfaces that satisfies all the 
following requirements 1 ~ 4, and is packaged into one JAR file. The reusable 
component has no dependence on elements outside of the component, and can 
be instantiated and used alone. 
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1. The classes can be treated as a JavaBeans component. The definition of the 
JavaBeans component has four requirements. First, the set must include one 
Facade class, which plays the role of the facade for outside the classes based on 
the Facade pattern[7]. Second, the Facade class must have one public default 
constructor (without any arguments). Third, the Facade class must implement 
the java, io . Serializable interface. Fourth, the set should be packaged into 
one JAR file including a manifest file that explicitly specifies the Facade class. 

However, requirement 1 allows components to depend on classes that are 
not included in the component’s JAR file. Moreover, this requirement does not 
provide any means of using the components’ functions without understanding 
the content of components. Therefore, we add the following requirements 2 ~ 4 
to ensure that the target component is indeed reusable. 

2. An interface (Facade interface) declaration of the Facade class is sepa- 
rated from its implementation. This separation ensures that any implementation 
change for the same interface does not influence the client sides of the compo- 
nent. Moreover, a user needs only understand the Facade interface. 

3. All classes/interfaces necessary for instantiating an object of the Facade class 
must be packaged into the component’s JAR file. The entire component must 
be instantiated by invoking a default constructor of the Facade class. 

4. All classes in the set, except static classes, must be instantiation-reachable 
from the Facade class. A static class is a class of which all fields and methods 
are declared with the modifier static. In the following, the expression Static(c) 
is true if class c is a static class. This requirement decreases the necessity of 
instantiating and passing objects of the component’s participant classes from 
the outside of the component to the component. Therefore, for users who want 
to reuse components, the necessity of understanding the content of components 
can be decreased. 

Even if some of the component’s participant classes is not instantiation- 
reachable from the Facade class, it is possible to instantiate an object of the 
Facade class. Thus, we call a set of Java classes/interfaces which satisfies all of 
the above requirements “the strong component.” We call a set which satisfies 
requirements 1 ~ 3 but does not satisfy requirement 4 “the weak component.” 
Our system treats both strong and weak components as retrieval units. 



2.3 Detection of Clusters 

By judging the reachability on the CRG, our system detects all strong/weak 
clusters that are candidates of strong/weak components. 

Definition 4 (Strong/Weak Cluster) 

The pair of the Facade node Node(c) and the set of nodes V c is a strong cluster, 
denoted as cs = ( Node(c),V c ), where all of the following requirements (a)~(c) 
are satisfied. The strong cluster is a candidate for a strong component. In addi- 
tion, the pair of Node(c) and V c is a weak cluster, denoted as cw = ( Node(c),V c ), 
where requirements (a) and (b) are satisfied but (c) is not. The weak cluster is 
a candidate for a the weak component. 



258 



H. Washizaki and Y. Fukazawa 



(a) . Class c is a concrete class that has a default constructor or a construc- 

tor with only primitive types or classes in the core packages as its argu- 
ments (called “basic constructor”). This requirement is shown expressed as 
Default(c). 

(b) . V c is a set of all nodes that are dependence-reachable from Node(c). 

(c) . All nodes in V c , except nodes corresponding to static classes, are 

instantiation-reachable from Node(c). 

The following clustering algorithm GC(u, T) specifies the set of all possible 
strong clusters (Cs = {csi, cs m }) and the set of all possible weak clusters 
(Cw = {cw i, cw n }) using a given node u as the starting node for exploration. 

GC(u,r) 

Cs < - Cw <- Vd 4- {v I (u =» u)}; 

for_each v/ G Vd do 

if Default(Node~ 1 (vf)) Ar/4«A (W | v' € Vd => 

( Vf • ■ ■ > v' V Static(N ode -1 (v')))) then Cs <— Cs U {(u/, V^)}; 
else Cw <— Cw C {(vf,Vd)}; 
end Jf 
end Tor 

return Cs, Cw', 

For example, we obtained two strong clusters by applying GC(u, r) 
to the CRG r shown in Figure 2(a) using all nodes as start- 
ing nodes ( u € V): csl = (Room, {Room, MapSite}), and cs2 = 

(MazeFactory, {MazeFactory , Room, MapSite, Door, Maze, Wall}). Simi- 
larly, we obtained four weak clusters whose Facade nodes are BombedWall, Door, 
Maze, and Wall. 



2.4 Component Extraction Procedure 

All steps of the extracting procedure are shown below. 

1. Our system generates CRG T of the target collection of source codes. 

2. Our system detects all strong clusters and weak clusters by applying GC(u, T) 
using all nodes in T as starting nodes. 

3. Our system performs the following steps (a)~(i) for each detected cluster 
(vf,V c ). In the following, class 7Vode _1 (u/) is described as c/. 

(a) . Our system creates a new Facade interface if whose name is I+(c/’s name), 

and implements if to c/. 

(b) . Our system adds the declarations of all public methods implemented within 

classes (Ce), which are inheritance-reachable from c/ ( Ce = { Node~ 1 {v ) | 
* 

(vf — > v) in r} ), to */. 

(c) . Our system adds the declarations of the setter methods and getter methods 

corresponding to all public fields of Ce to if. 
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(d) . Our system adds the implementations of the setter methods and getter 

methods, which are newly declared within if at Step (c), to Cf. The setter 
method is named set+(field’s name), and is simply implemented to change 
the value of the corresponding field using an input value. The getter method 
is named get+(field’s name), and is simply implemented to return the value 
of the corresponding field. 

(e) . Our system sets the protection modifier of c/ to public. 

(f) . Our system implements the Serializable interface to c/. 

(g) . If Cf has only a basic constructor, our system adds a new public default 

constructor, which invokes the basic constructor using initial values of ar- 
gument types for its arguments, to c/. The initial value of each type is 
uniquely defined in the following way. 

Primitive type: The value when a variable of the type is only declared 
becomes the initial value. For example, the initial value of int is zero. 
Class in the core packages: Our system attempts to obtain the initial value 
by an invocation of a constructor, whose number of arguments is the least 
among all constructors of the type, using initial values corresponding to 
types of constructor arguments recursively. 

Otherwise, null is used as the initial value. 

(h) . By using a Java compiler, our system compiles source codes of modified c/ 

and newly created if. 

(i) . Our system creates a manifest file, which specifies c/, and packages all class 

files of classes/interfaces in V f and if into one JAR file. The name of c/ 
becomes the component’s name. 

By the above-described procedure, our system extracts reusable components 
that can be instantiated independently of other classes. 

For example, Figure 2(b) shows a UML class diagram of a newly extracted 
component (named “Room”) that corresponds to the strong cluster csl in Figure 
2(a). Our system created a new Facade interface (iRoom), and implemented it to 
the Facade class (Room). IRoom designates the getter method and setter method 
corresponding to the Room’s public field (roomNumber). Moreover, our system 
packaged all class files of classes/interfaces (Room, IRoom, and MapSite) necessary 
for instantiating an object of the Facade class into one JAR file as a component. 

2.5 Storing and Indexing Components 

Our system stores all JAR files of extracted components into the component 
repository. Moreover, our system segments all source codes of components into 
words, and records each word according to type of word (classes/interfaces used 
in the component, classes/interfaces that used the component’s participants in 
the original program, component’s participants, methods, fields, variables, and 
comments) and number of appearances. Our system stores the word information 
in the index table on the RDB. 

In addition to word information, our system records measurement values by 
applying the following two reusability metrics for all components: DIT[9] for 
the Facade class (the confidence interval is [lower confidence limit: 2, upper 



260 



H. Washizaki and Y. Fukazawa 



confidence limit: 4]), and SCCr[8] ([0.61,1.0]). We previously confirmed that 
the component’s reusability is high if measurement values of these metrics lies 
within their corresponding confidence intervals [8] . By an interval estimation, we 
have calculated confidence intervals using all JavaBeans components that were 
judged as highly reusable components at JARS.COM. Measurement values of 
these metrics will help users to select components with higher reusability. 

2.6 Searching Components 

Our system searches components based on given keywords and the RDB includ- 
ing index information. Search execution sequences are described as follows. 

1. Our system receives one or more keywords input by a user via the web 
browser. Moreover, our system accepts the advanced search conditions including 
the type of each keyword. For example, the user can search components that 
contain keywords “rename” and “copy” in only method names. 

2. Our system issues a query in the form of SQL to the RDB, and presents 
a list of components on the web browser. All found components contain given 
keywords in these source codes. On the web browser, components are sorted in 
order from the largest of each rank weight. We define the rank weight w c of the 
component c as follows based on the TF-IDF scheme [10]: 



where m is the number of given keywords, ti(i = l,...,m) denotes the each 
keyword, d(c) is a set of kinds of appearing words in c, tf(t,c) is the frequency 
of them t in c, N is the total number of components in the component repository, 
and df(t) is the number of components in which term t appears. 

w c reflects the term frequency of given keywords in c. At the same time, w c 
also reflects the term specificity of given keywords in all components. 

3. Our system presents structural information (name of the Facade 
class/interface, other participants, and classes/interfaces that used the com- 
ponent’s participants in the original program) and measurement values of the 
component that is selected from a result list by the user. The user can confirm 
structural features of the component by structural information and measurement 
values on the web browser. 

4. Finally, if the user judges that the component satisfies his/her requirement, 
the user downloads and reuses a JAR file of the component. 

3 Experimental Evaluation 

We have implemented the search system proposed in previous sections. We 
used Java2SE SDK for the system’s implementation, JavaCC for making a Java 
parser, MySQL as the RDB, and Java Servlet for the runtime environment. 
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3.1 Extraction Experiment 

We have attempted to extract all possible components using all classes as 
starting nodes in ten programs: JUnit[ll], JBeans[12], Regexp[13], JXPath[13], 
TableExample[14], Metalworks [14], SampleTree[14], Font2DTest[14], 
FileChooser[14] and SwingSet2[14]. No classes in these programs have been 
packaged as components. Table 1 shows the number of extracted strong/ weak 
components (denoted as N.A11), the average value of the number of participants 
except newly created Facade interface in each component (#V C ), and the 
number of components for which, at least one measurement value of two 
reusability metrics lies within the corresponding confidence interval (N.DS) 
for each program. Nc denotes the total number of classes/interfaces in each 
program. 

In Table 1, our system extracted several strong components and many weak 
components from ten programs regardless of the kind of program. From the mea- 
surement result of reusability metrics, 72% of all extracted components are high 
reusable. This result suggests that our system can automatically detect reusable 
parts and extract them as components from a collection of Java programs. 



Table 1 . Number of extracted components (strong/weak) 



Program 


Nc 


N.A11 


N.DS 




JUnit 


181 


11 / 57 


8/48 


1.5 / 24.1 


Regexp 


16 


9/4 


2/3 


4.8 / 7.5 


JXPath 


121 


22 / 26 


9/8 


2.6 / 58.4 


J Beans 


241 


16 / 89 


12 / 75 


1.4 / 23.0 


TableExample 


18 


5/3 


4/3 


2.0 / 4.1 


Metalworks 


32 


16 / 1 


14/1 


5.3 / 2.0 


SampleTree 


18 


3/2 


2/ 1 


7.0 / 2.0 


Font2DTest 


20 


2/0 


2/0 


19.0 / 0.0 


FileChooser 


8 


3/0 


3/0 


3.3 / 0.0 


SwingSet2 


130 


12 / 1 


9/0 


1.2 / 38.0 


Total (Avg.) 


785 


282 


204 


(19.0) 



3.2 Search Experiment 

We searched all extracted components (from ten programs) with a keyword 
“panel” to retrieve widgets that represent GUI panels. We had 18 matched com- 
ponents that contain the given keyword, and they are sorted by rank weights as 
shown in Table 2(a). Table 2(a) shows the number of participants except newly 
created Facade interface in each component (#U C )> reusability (DS), and adapt- 
ability to the requirement (AD). DS denotes whether at least one measurement 
value of two reusability metrics lies within the confidence interval. AD denotes 
whether the component satisfy the given requirement. We judged that the com- 
ponent, which is a GUI widget containing a panel, satisfy the given requirement. 

Among matched components, almost all components (94%) are high reusable 
based on reusability metrics. Moreover, 67% of matched components satisfy the 
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given requirement. Six components are not GUI widgets; however, all of these 
components instantiate GUI widgets that contain panels. Therefore, our sys- 
tem can support users to retrieve components that satisfy or relate to users’ 
requirements from a collection of Java programs. Since many of matched com- 
ponents consist of a number of participants, our approach of packaging related 
classes/interfaces has functioned effectively. When using conventional techniques 
that target an individual source or class as a retrieval unit, the user has to acquire 
related source codes/classes in addition to matched source codes. 

As an experiment concerning non-GUI functions, we searched components 
with a keyword “sort” to retrieve components that provide sort functions (Ta- 
ble 2(b)). We judged the component, which provides a specific function to sort 
data sets, satisfy the given requirement. Three components satisfy the given re- 
quirement. Because of the high frequency of the keyword in these components, 
these components are ranked high. This result suggests that our system supports 
users to select appropriate components by ranking in order of each rank weight. 

Table 2. (a) Results with a keyword “panel” (b) Results with a keyword “sort” 



Rank 


Component 


#V C DS AD W c Rank 


Component 


#Vc DS AD w c 


i 


DirectionPanel 


2 Yes Yes 0.14 1 


Sorter 


1 Yes Yes 0.05 


2 


MetalworksPref s 


5 no Yes 0.12 2 


TableEx ample3 


6 Yes no 0.01 


3 


TableEx ample 


8 Yes no 0.03 3 


TableSorter 


3 Yes Yes 0.01 


4 


FilePreviewer 


1 Yes Yes 0.03 4 


TableEx ample 


8 Yes no 0.01 


5 


Metalworks 


32 Yes no 0.02 5 


SorterTest 


13 Yes no 0.00 


6 


MetalworksFrame 


31 Yes Yes 0.02 6 


AllTests 


44 no no 0.00 






7 


SampleTree 


18 Yes Yes 0.00 


18 


Metalworks 


2 Yes Yes 0.01 








DocumentFrame 


15 


Font 2DTest Applet 


20 Yes no 0.00 



4 Related Work 

Our system is the first one to search existing 00 programs by extracting reusable 
components. Nonetheless, our approach bares resemblance to both software clus- 
tering techniques and search techniques. 

Software clustering attempts to decompose software systems into mean- 
ingful subsystems to facilitate understanding of those systems or to reuse 
subsystems [15]. However, conventional clustering techniques do not guarantee 
result set of subsystems have no dependence on elements outside of each subsys- 
tem. 

Conventional search techniques that do not require any external description 
are based on the automatic extraction of the word or type information from 
source codes. These conventional techniques target an individual source code[l, 
2,3,4] or type [5] as a retrieval unit, j Central [2], SPARS [4] and other code anal- 
ysis tools (such as Understand for Java[16]) help the acquisition of dependent 
classes by representing relations among classes. However, users must still per- 
form additional tasks to acquire source codes corresponding to dependent classes. 
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Moreover, when two or more related classes are obtained, users must judge which 
class plays the role of controlling other obtained classes. 

There are several search techniques that use the external description, such as 
the faceted classification technique [17]; however, the preparation costs become 
large because additional descriptions are necessary. 

5 Conclusion 

We have implemented the component-extraction-based program search system 
that automatically extracts JavaBeans components from existing Java program 
source codes, and supports the user to search extracted components by given 
keywords. As a result of experimental evaluations, we have confirmed our system 
is useful for retrieving reusable components from a collection of Java programs. 
Our approach can be similarly applied to other component architectures. We will 
extend our system to accept source codes implemented in other languages and 
search components based on other component architectures, such as ActiveX. 
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Abstract. Non-stop and highly available applications need to be dynamically 
adapted to new conditions in their execution environment, to new user re- 
quirements or to some situations usually unpredictable at build-time. Bank, 
aeronautic, mobile and Internet applications are well known examples of appli- 
cations requiring the dynamic reconfiguration. On the other hand the develop- 
ment complexity and cost constitute an important problem for the creation of 
applications supporting to be dynamically reconfigured. The work we present 
in this paper is centered around the dynamic reconfiguration of component- 
based applications. It is dedicated to describing DYVA, a virtual dynamic re- 
configuration machine. The virtual aspect of DYVA means its independence 
from a particular application or a particular component model, which enhances 
its genericity and its reusability. 

Keywords. Dynamic reconfiguration, Component model, Metamodel, DYVA. 



1 Introduction 

For some critical and highly available applications the dynamicity is considered as an 
important criteria to guarantee an acceptable availability and quality of service. In this 
paper, by dynamic we mean the ability to reconfigure a running application to take 
into account some new conditions, sometimes unpredictable, without completely 
stopping it. In general the dynamic character of an application is proportional to its 
complexity, in other words, the more we want a system dynamic, the more we must 
complete a complex development work. 

The work we present in this paper is centered around the dynamic reconfiguration 
of component-based applications [1], It is dedicated to describing DYVA, a virtual 
dynamic reconfiguration machine. In this paper we consider that an application con- 
figuration is the set of components forming this application and their interconnec- 
tions. A reconfiguration can be then defined as any operation whose role is to modify 
the initial configuration. An operation can be for example the disconnection of com- 
ponents, the creation of new connections, the modification of existing connections 
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(reconnection), the addition or the removal of components or the replacement of 
components or groups of components. A dynamic reconfiguration can then be de- 
fined as a reconfiguration performed on a running application without fully stopping 
it and whilst preserving its consistency. 

This paper is organized as follows: in Section 2 we discuss our motivations for the 
notion of a virtual dynamic reconfiguration machine. Section 3 is dedicated to de- 
scribing DYVA and before we conclude. Section 4 explains through example how to 
use DYVA to support the dynamic reconfiguration of a component-based application. 



2 Motivations and Objectives 

Our motivations for the idea of a virtual machine to support the dynamic reconfigura- 
tion of component-based applications have two facets: the importance of the dynamic 
character of applications and the virtual aspect of our dynamic reconfiguration ma- 
chine. 

The dynamicity is in general relevant for some classes of critical, non-stop and 
highly-available applications where the continuity of service is very important. We 
need to dynamically reconfigure a running application for many reasons: 

The clients of this application need a continuous service and do not support 
the interruption of the functionalities provided by the application like in 
aeronautic and real-time applications. 

The execution environment changes frequently and in this case it is imprac- 
tical to stop the application every time to take into account the new parame- 
ters. For example a multimedia presentation application is closely related to 
the variation of the bandwidth. Such an application must adapt its presenta- 
tion policy according to this bandwidth. 

Stopping/restarting the application requires a lot of effort or it decreases the 
quality of service. A large distributed application, for example, requires 
complex and tedious work to be stopped, updated, rebuilt and correctly re- 
started, therefore it is easier and more preferable to perform the required 
modifications without completely stopping the application. 

It is important to note that an application may be reconfigured for other reasons 
such as those discussed in [2], In the previous paragraph we were interested only in 
the reasons for which the reconfiguration must be done dynamically. In [3] other 
examples of domains where the dynamic reconfiguration is needed are given. 

In spite of the importance of dynamicity, it remains one of the key challenges fac- 
ing software developers today. To develop an application that can be dynamically 
reconfigured the developer must deal with the reconfiguration code instead of focus- 
ing on its application logic. In [4] we presented and evaluated many approaches 
dealing with the dynamic reconfiguration of component-based applications. Ap- 
proaches like [5,6] associate to each component a specific part dedicated to its man- 
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agement and responsible for providing the necessary functionalities for its dynamic 
reconfiguration. 

DYVA, which stands for “DYnamic Virtual Adaptation machine”, is our solution 
to support the dynamic reconfiguration of component-based applications. It is the 
result of several works which we completed previously [4, 7, 8, 9]. DYVA is intended 
to be generic which means its independence from a particular application or a par- 
ticular component model. The idea is to start from a concrete application, that can be 
instrumented, and to create an image of this application according to a canonical 
model. At run-time the canonical representation can be dynamically reconfigured 
which causes the reconfiguration of the concrete application (the canonical represen- 
tation and the application are causally connected). 



3 DYVA: Our Dynamic Virtual Reconfiguration Machine 

3.1 Overview of Our Approach 

Recently we proposed and prototyped a dynamic reconfiguration framework for 
JavaBeans-based applications [7,10]. This framework enables one to dynamically 
reconfigure a JavaBeans-based application with the minimal participation of the user. 
In the context of another project, the same work has been done for OSGi-based appli- 
cations [8,11], this second work allowed us particularly to validate the dynamic re- 
configuration process proposed in [7]. 

After working on two different component models, namely JavaBeans and OSGi, 
some basic concepts seemed to be apparent and shared by both models. For instance 
the dynamic reconfiguration routines and process. This observation was a key ele- 
ment that motivated us to work on a virtual dynamic reconfiguration framework. 

3.2 DYVA Logical Architecture 

The global architecture of DYVA is presented in Figure 1. 

As shown in Figure 1 three parts can be distinguished: 

The base-level: represents the concrete application that provides the ex- 
pected functionalities. 

The meta-level: it is transparent to the application users and can be seen 
as an abstract representation of the concrete application. 

DYVA kernel: the different operational modules of the dynamic recon- 
figuration machine. 

3.2.1 Causal Connection 

It is important to note that the meta-level is not only an independent snapshot of the 
base-level but it is causally connected to it: modifying the meta-level causes a corre- 
sponding modification to the base-level, and vice versa. Monitoring and modification 
operations realize the causal connection between the base-level and its abstraction. 
They assure that the modifications occurring at one level are propagated in the other 
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level in order to guarantee that each level gives an accurate view of the other. The 
causal connection is an important concept in reflective architectures [12], 




DWA Kernel 



Meta-Level 



Base-Level 



Fig. 1. DYVA global architecture 



3.2.2 Sensors 

Sensors are the concrete elements that assure the monitoring. They are responsible for 
observing the application and its execution environment and sending up the relevant 
events occurring at the base-level. By relevant we mean a change at the base-level 
concerning an entity represented at the meta-level, for example if a property of a com- 
ponent instance is modified, an event is sent up only if this property is forming part of 
the instance state or representing an interaction point (receptacle), otherwise no event 
is needed. Notification events are only limited to information represented at meta- 
level for performance reasons. There are two kinds of sensors: 

Environment sensors: a sensor of this category is a module permanently exe- 
cuted (as a daemon) and controlling a specific environment resource (disk, 
memory, load, bandwidth...). 

Application sensors: an application sensor is an agent belonging to the appli- 
cation code, and responsible for informing the meta-level when a significant 
event occurs in the application. In the current version an event is sent in the 
following situations: 

Component instance created: an event specifying the reference of 
the created instance is fired. 

Method called: the user can chose to send an event for each called 
method or to select only some methods. 

Property affected: if a property is forming part of a component in- 
stance state, an event is sent up if its value changes. If the property 
represents an interaction point between two instances and its value 
is changed, that means that the application architecture has been 
modified. An event is then sent up. 
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3.2.3 Supervisor 

The role of the supervisor is to allow the auto-reconfiguration of applications. It repre- 
sents a reasoning engine that introspects or receives events from the system abstrac- 
tion and takes decisions according to these events and according to some reconfigura- 
tion policies. A decision may have one of the following forms: 

Invocation of the reconfiguration manager operations, for example if some 
component instance is no longer available, the supervisor may ask the recon- 
figuration manager to create a new instance of the same component and to re- 
connect it in the place of the unavailable one. 

Delegation of the decision to an external actor if the supervisor is not able to 
reason about the received event and to take an appropriate decision. In this 
case the supervisor displays a message describing the situation and a human 
administrator, for example, must intervene to take a decision. 

3.2.4 Reconfiguration Policies 

Some policies are required to make possible the auto-reconfiguration of applications. 
Auto-reconfiguration means the ability to take consistent dynamic reconfiguration 
decisions without the intervention of an external actor (usually a human administra- 
tor). The current specification of reconfiguration policies is very primitive. It consid- 
ers a very simple form of reconfiguration rules. The improvement of this specification 
is one of our perspectives. 

3.2.5 Reconfiguration Manager 

The reconfiguration manager is the central part of the reconfiguration framework. It 
provides and implements the basic dynamic reconfiguration routines. These routines 
operate on the system abstraction (and not directly on the base-level) which guarantee 
the independence and the genericity of our machine. In the following points we de- 
scribe the main operations implemented by the reconfiguration manager, all these 
operations can be applied at the abstract level then propagated thanks to the causal 
connection to the base-level: 

- Removing a component instance: a component instance could be removed only if it 

is in a stable state. For example, if an instance is modifying a file or a database, it 
should not be removed before the end of its writing task. When a component in- 
stance is removed all its connections with other instances are systematically re- 
moved. 

- Removing a component: removing a component instance does not affect other in- 

stances of the same component. Sometimes instead of removing one instance, it is 
hoped to remove a component. In this case the component and its resources are 
unloaded and all its instances are removed. 

- Dynamic disconnection: two kinds of dynamic disconnection are supported by the 

reconfiguration manager: 

• Port-level disconnection: the administrator must specify the source and the 
target ports concerned by the disconnection. 

• Instance-level disconnection: all the provided and required ports of the in- 
stance are concerned by the disconnection. 
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- Dynamic connection: to create a new connection the administrator must specify the 

required port and the provided port to be connected. These two ports must be 
compatible to allow the connection operation. 

- Dynamic reconnection: the dynamic reconnection uses the disconnection and con- 

nection operations described above. It is a fundamental operation which is useful 
for all other reconfiguration operations. The reconfiguration manager provides 
two kinds of dynamic reconnection: 

Port-level reconnection: only one port is concerned by the reconnection. 
Instance-level reconnection: all the provided and required ports of the in- 
stance are concerned by the reconnection. 

- Creating a new component instance: the extension or the adaptation/correction of 

the application functionalities may require the creation of new instances. 

- Replacing a component instance: for performance or correction reasons we need 

sometimes to replace one component instance by another one more suitable for 
the application requirements. The instance replacement is a complex process 
composed of a set of ordered activities that can be summarized as follows: 

Creating the replacing instance if it does not already exist. 

Passivating the old instance which means stopping all ingoing requests 
sent to this instance. The stopped requests should be saved and treated 
later to complete the reconfiguration operation. 

Transferring the state of the old instance to the new one for consistency 
reasons. 

Reconnecting the new instance in the place of the replaced one. 

Activating the new instance and potentially removing the old one. 

The state transfer is a very complex problem, in this paper the state of a component 
instance is considered simply as a subset of its data-structure values. These data- 
structures have to be accessible to be able to read the state of the replaced instance 
and to write it in the new one. Other complex aspects of the state transfer have not 
been addressed like inaccessible data-structures and the execution state (treads). 



4 How to Use DYVA in Practice 

DYVA can be used in two different ways: declaratively or programmatically. 

- Using DYVA declaratively: a description of each component forming the ap- 
plication is necessary. A dedicated tool uses this description provided by the 
user and instruments the application according to it. 

- Using DYVA programmatically: a description of the application components 
is also required in this case, however, the developer has to explicitly use an 
API provided by DYVA. Therefore in the second case, no instrumentation tool 
is needed. 

In this section we will describe only the declarative usage of DYVA. We first show 
through an example how to create the description file of a simple application. Then 
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we describe the instrumentation process and we focus particularly on the modifica- 
tions introduced on the application during this process. 



4.1 Application Description 

The first step before using DYVA is to provide a description of the application to be 
reconfigured. This description must be done according to the canonical model pre- 
sented in the previous section. According to the application component model, it is 
possible to build semi-automatic description tools responsible for analyzing applica- 
tions and discovering partially their description. If the targeted component model 
provides explicitly the application description in a given format (ADL for example), a 
transformation tool can be used to map this description to the canonical model. 



Legend 

— Q Provided Port 
— (J Required Port 




MyApp Meat ion 



Fig. 2. Application example 



The description file of the application example presented in Figure 2 looks like the 
following. 



- <Repository> 

- <Components> 

- cComponent name Client 

- <RequiredPorts> 

<RequiredPort name="cp" type="IServer" /> 
</RequiredPorts> 

dmplementation name= exampleOl. Client /> 
</Component> 

- cComponent name= ServerFR"> 

- <ProvidedPorts> 

cProvidedPort name= sp type="IServer" /> 
</ProvidedPorts> 

dmplementation name- exampleOl. ServerFR /> 
</Component> 

</Components> 

</Repository> 



For simplification reasons, in the description file many details have been omitted 
like reconfiguration dependencies between components, reconfiguration compatibil- 
ities and other metadata describing the internal structures forming the state of each 
component. 

It is important to note that the previous file describes only the components and not 
the application architecture. Information about the application architecture in terms of 
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component instances and their interconnections are discovered at run-time. The re- 
sponse to how this can be done is given in the following section. 



4.2 Application Instrumentation 

After having the application description, an instrumentation tool is mainly used to 
introduce some modifications in the application code. In the following we show the 
instrumentation process of the application example we presented above. This applica- 
tion example has been implemented in OSGi component model. 

After the instrumentation, the following modifications are introduced in each com- 
ponent implementation: 

In each constructor, adding a call to DYVA to inform it when an instance is 
created. 

A call to DYVA is inserted at the beginning and at the end of public meth- 
ods. This is optional and can be parameterized by the user to select only 
some methods or to call only at the beginning or only at the end of some 
methods. 

If a property is forming part of a component instance state, a call to DYVA 
is inserted to inform it if the property value is modified. If the property rep- 
resents an interaction point between two instances (required port), a call to 
DYVA is then inserted to inform it if the property value is modified (that 
means that the application architecture has been modified). 

For each required port two methods are needed to connect and to disconnect 
the port. If these methods are not provided by the developer, they are in- 
serted automatically by the instrumentation tool. 

A call to DYVA means simply sending an event with the appropriate information 
to the meta-level. This allows the application architecture to be dynamically created. 



4.3 Reconfiguring the Application at Run-Time 

After it has started, the first task the instrumented application transparently does is to 
create the control environment that allows to show graphically its architecture as 
illustrated in Figure 3 (the control environment is created after firing the first event of 
the application). This environment allows also to control and to dynamically recon- 
figure the underling application. 

Any reconfiguration operation acts on the meta-level and is systematically propa- 
gated to the concrete application thanks to the causal connection. Using the recon- 
figuration interface, the administrator can graphically reconfigure the running appli- 
cation. 
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Fig. 3. Reconfiguration interface 



For example in Figure 3, a new component (“ ServerEU ”) has been added to the 
application, an instance of this component is created (“[ 1-0 ] ServerEU”), and as il- 
lustrated by the contextual menu, the old “ServerFR” instance will be replaced (by 
the new “ServerEU” instance). 



5 Conclusion 

A dynamically reconfigurable application is an application which supports some 
modifications in its architecture and behavior at run-time without being fully stopped, 
and whilst preserving its integrity and an acceptable quality of service. This property 
is very important for some classes of critical, highly available and non-stop applica- 
tions and constitutes one of the key challenges facing software developers today. This 
paper presented DYVA, a generic support that assists developers to build dynamically 
reconfigurable applications and provides means to administrators by which to recon- 
figure these applications at run-time. Our approach, intended to be generic and reus- 
able, serves as a support to enhance applications with the dynamic capabilities and 
provides means to dynamically reconfigure these applications at run-time. The auto- 
mation is an important aspect of our approach which simplifies the creation of appli- 
cations endowed with dynamic capabilities on one hand, and gives the ability to take 
automatically some reconfiguration decisions according to a set of strategies on the 
other hand. 

The results we obtained after the development of DYVA allow us to conclude that 
an approach driven by the abstraction of technical and specific properties of applica- 
tions seems to be very promising. The next step to follow the work we presented in 
this paper has three main facets: stabilizing and extending our current prototype, en- 
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larging the set of reconfiguration operations currently supported and extending the 
work to other component models. All these facets are needed to validate our approach 
and to prove its feasibility. 
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Abstract. This paper describes a multiple-view meta-modeling approach for 
managing variability in software product lines using the Unified Modeling 
Language notation (UML). A multiple- view meta-model for software product 
lines describes how each view relates semantically to other views. The meta- 
model depicts life cycle phases, views within each phase, and meta-classes 
within each view. The relationships between the meta-classes in the different 
views are described. Consistency checking rules are defined based on the rela- 
tionships among the meta-classes in the meta-model. This paper briefly de- 
scribes multiple- view modeling of software product lines before describing the 
multiple-view meta-modeling approach for software product lines and an ap- 
proach for consistency checking between meta-model views. The paper then 
provides a detailed description of the tool support for product line multiple- 
view meta-modeling, meta-model consistency checking, and product line 
member configuration from the product line architecture. 



1 Introduction 

The field of software reuse has evolved from reuse of individual components towards 
large-scale reuse with software product lines [1], Software modeling approaches are 
now widely used in software development and have an important role to play in 
product lines [5], Modern software modeling approaches, such as the Unified Mod- 
eling Language (UML), provide greater insights into understanding and managing 
commonality and variability by modeling product lines from different perspectives. A 
multiple-view model [7] of a software product line captures the commonality and 
variability among the software family members that constitute the product line. A 
better understanding of the product line can be obtained by considering the different 
perspectives, such as requirements modeling, static modeling, and dynamic modeling, 
of the product line. Using the UML notation, the functional requirements view is 
represented through a use case model, the static model view through a class model, 
and the dynamic model view through a collaboration model and a statechart model. 
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While these views address both single systems and product lines, there is, in addition, 
a feature model view, which is specific to software product lines. This view describes 
the common and variant features of the product line. 

This paper starts by describing multiple-view modeling of software product lines. 
It then goes on to describe the multiple-view meta-model for software product lines. 
In order to illustrate the approach, two meta-model views are described in more de- 
tail, the product line class model view and the feature model view. The paper goes on 
to describe product line meta-modeling in UML and consistency checking between 
meta-model views. The paper then describes tool support for meta-modeling, meta- 
model consistency checking, and product line member configuration from the product 
line architecture. 



2 Multiple-View Models of Software Product Lines with UML 

A multiple-view model for a software product line defines the different characteristics 
of a software family [8], including the commonality and variability among the mem- 
bers of the family [1, 11]. A multiple-view model is represented using the UML nota- 
tion [4, 9]. The product line life cycle includes three phases for: 

Product Line Requirements Modeling: 

• Use Case Model View. The use case model view addresses the functional require- 
ments of a software product line in terms of use cases and actors. Product line 
commonality is addressed by having kernel use cases, which are common and 
therefore directly reusable in all product line members. Product line variability is 
addressed by having optional and alternative use cases, which are used by some but 
not all product line members. 

Product Line Analysis Modeling: 

• Static Model View. The static model view addresses the static structural aspects of 
a software product line through classes and relationships between them. Kernel 
classes are common to all product line members, whereas optional and variant 
classes address product line variability. 

• Collaboration Model View. The collaboration model view addresses the dynamic 
aspects of a software product line, which captures the sequence of messages passed 
between objects that realize kernel, optional, and alternative use cases. 

• Statechart Model View. The statechart model view, along with the collaboration 
model view, addresses the dynamic aspects of a software product line. A statechart 
defines states and state transitions for each state dependent kernel, optional, and 
variant class. 

• Feature Model View. A feature model view captures feature/feature dependencies, 
feature/class dependencies, feature/use case dependencies, and feature set depend- 
encies. The feature model view is the key for managing variability in software 
product lines. 

Product Line Design Modeling: During this phase, the software architecture of the 
product line is developed. 
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For software product lines, it is important to address how variability is modeled in 
each of the different views. A multiple-view model is modified at specific locations 
referred to as variation points. 



3 Multiple- View Meta-model for Software Product Lines 

Consistency checking between multiple views of a model is complex, one of the rea- 
sons being the different notations that are needed. An alternative approach [6, 10] is 
to consider consistency checking between multiple views at the meta-model level. 
The meta-model describes the modeling elements in a UML model and the relation- 
ships between them. The meta-model is described using the static modeling notation 
of UML and hence just uses one uniform notation instead of several. Furthermore, 
rules and constraints can be allocated to the relationships between modeling elements. 

The multiple views are formalized in the semantic multiple-view meta-model, 
which depicts the meta-classes, attributes of each meta-class, and relationships among 
meta-classes. Relationships can be associations, compositions/aggregations (strong 
and weak forms of whole/part relationships), and generalization/specializations. A 
high level representation of the phases containing the views in this meta-model is 
shown in Figure 1 . A phase is modeled as a composite meta-class that is composed of 
the views in that phase, as shown in Figure 1 . 
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Fig. 1 . High-level relationships between multiple views for a software product line 

In the meta-class model, all concepts are modeled as UML classes. However, as 
the meta-classes have different semantic meaning, they are assigned stereotypes cor- 
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responding to the different roles they play in the meta-model. Thus in Figure 1, all the 
meta-classes represent the different views of a UML model and are assigned the 
stereotype «view». Meta-classes representing development phases are assigned the 
stereotypes «phase» as they represent the different phases of the 00 lifecycle, Re- 
quirements Modeling, Analysis Modeling, and Design Modeling. 

Each view in Figure 1 can be modeled in more detail to depict the meta-classes in 
that view. A view meta-class is a composite class that is composed of the meta-classes 
in that view. An example is given in Figure 2, which depicts the meta-classes in the 
Class Model view and their relationships. Thus the Class Model view contains meta- 
classes such as class, attribute, relationship and class diagram, as well as the relation- 
ships between them. 

Fig. 1 depicts underlying relationships among multiple views in development 
phases of a software product line. The views in each phase are: 

Requirements phase: 

- Use case model: This model describes the functional requirements of a software 
product line in terms of actors and use cases. 

Analysis phase: 

- Class model: This model addresses the static structural aspects of a software prod- 
uct line through classes and their relationships. 

- Statechart model: This model captures the dynamic aspects of a software product 
line by describing states and transitions. 

- Collaboration model: This model addresses the dynamic aspects of a software 
product line by describing objects and their message communication. 

- Feature model: This model captures the commonality and variability of a software 
product line by means of features and their dependencies. 

Design phase [4]: 

- Consolidated collaboration model: This model synthesizes all the collaboration 
diagrams developed for the use cases. 

- Subsystem architecture model: Based on the consolidated collaboration model, this 
model addresses the structural relationships between subsystems. 

- Task architecture model: This model addresses the subsystems decomposition into 
tasks (active objects) and passive objects. 

- Refined class model: This model addresses the design of classes by determining the 
operations and attributes of each class. 



3.1 Meta-model Views 

This section describes the meta-classes and their attributes, as well as the relationships 
between the meta-classes for the Class and Feature Model views in Fig. 2. Other 
views shown in Figure 1 are described in [10]. 

Fig. 2 depicts meta-classes and relationships between the meta-classes for the class 
and feature model views. A class diagram consists of classes and their relationships. 
A class may interact with an external class, such as an external input/output device or 
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user interface. Each class may have attributes. Relationships between classes are 
specialized to aggregation, generalization/specialization, and association relation- 
ships. To capture variations of a software product line, the meta-model specializes a 
class to a kernel, optional or variant class. Kernel classes are required by all members 
of a software product line, whereas optional classes are required by only some mem- 
bers. Variant classes are required by the specific members to meet variations of kernel 
or optional classes. 

A feature is an end-user functional requirement of an application system. Features 
are specialized (Fig. 2) to kernel, optional, and variant features depending on the 
characteristic of the requirements, that is, commonality and variability. Kernel fea- 
tures are requirements common to all members of systems, that is, required by all 
members of a product line. Optional features are required by only some members of a 
product line. A variant feature is an alternative of a kernel or optional feature to meet 
a specific requirement of some systems. Feature dependencies represent relationships 
between features, and feature sets refer to constraints on the choice of target features 
supported by a target system. A feature set is specialized to “mutually exclusive fea- 
ture set,” “exactly-one-of feature set,” and “at-least-one-of feature set” [5], 




Fig. 2. Meta-model for class model and feature model views in analysis phase 



3.2 Relationships Between Meta-model Views 

A meta-model for multiple views in each phase describes the relationships between 
the different views in each development phase. A meta-model for multiple views in a 
given phase of a software product line describes the relationships between different 
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views in the same phase. The analysis phase of a software product line is viewed by 
means of the class model, collaboration model, statechart model, and feature model. 
Fig. 2 depicts a meta-model describing the relationships between the class and feature 
model views in the analysis phase. The relationships between the views are: 

- A feature in the feature model is supported by classes in the class model. 

- If there is a generalization/specialization relationship between two classes that 
support two different features respectively, the generalization/specialization rela- 
tionship between two classes maps to a feature dependency between the two fea- 
tures. 

A meta-model for multiple views in different phases of a software product line de- 
scribes the relationships between the multiple views in the different phases. It shows 
how a meta-class in a view of a phase is mapped to a meta-class in the subsequent 
phase. 



4 Consistency Checking Between Multiple Views 

Consistency checking rules are defined based on the relationships among meta- 
classes in the meta-model. The rules resolve inconsistencies between multiple views 
in the same phase or other phases, and to define allowable mapping between multiple 
views in different phases. To maintain consistency in the multiple-view model, rules 
defined at the meta-level must be observed at the multiple-view model level. Consis- 
tency checking is used to determine whether the multiple-view model follows the 
rules defined in the multiple-view meta-model. 



Multiple-View Model Multiple-View Meta-Model 




Fig. 3. Meta-model for feature and class model view 
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Fig. 3 depicts consistency checking between a feature in the feature model and a 
class in the class model. Suppose an optional class “Class2” supports an optional 
feature “Feature2.” Class2 and Feature2 in the multiple-view model are respectively 
instances of Class and Feature meta-classes in the multiple-view meta-model. There is 
a relationship between Class and Feature meta-classes, which is “each optional class in 
the class model supports only one optional feature in the feature model.” For the mul- 
tiple-view model to remain consistent, this meta-level relationship must be maintained 
between instances of those meta-classes, that is, Class2 and Feature2. Consistency 
checking confirms that each optional class in the class model supports only one op- 
tional feature in the feature model. 



5 Tool Support for Software Product Lines 

In order to support the multiple-view meta-modeling approach, the Product Line 
UML Based Software Engineering Environment (PLUSEE) has been developed. The 
scope of the PLUSEE includes the product line engineering and target system con- 
figuration phases (Fig. 4). 

a) Product line Engineering. A product line multiple-view model, which addresses 
the multiple views of a software product line, is modeled and checked for con- 
sistency between the multiple views. The product line multiple-view model and 
architecture is captured and stored in the product line reuse library. 

b) Target system configuration. A target system multiple view model is configured 
from the product line multiple-view model. The user selects the desired features 
for the product line member (referred to as target system) and the tool configures 
the target system architecture. 
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Fig. 4. Overview of PLUSEE 
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The PLUSEE represents second generation product line engineering tools which 
build on experience gained in previous research [2, 3]. Some design decisions affect- 
ing the development of the PLUSEE has been made as follows: 

a) Both Rose and Rose RT Commercial CASE Tools are used as the interface to this 
prototype. Rose supports the standard UML, but it does not generate an executa- 
ble architecture from the product line multiple-view model. On the other hand, 
Rose RT generates an executable architecture from the product line multiple 
view model and simulates the product line architecture although it does not fully 
support standard UML. To take advantages of Rose and Rose RT, two separate 
versions of PLUSEE, which are very similar to each other, were developed. 

b) The Knowledge Based Requirement Elicitation tool (KBRET) [2] and GUI de- 
veloped in previous research are used without change. 




Fig. 5. Product line engineering tools for PLUSEE 



Fig. 5 depicts the overview of the product line engineering tools for PLUSEE. A 
product line engineer captures a product line multiple-view model consisting of use 
case, collaboration, class, statechart, and feature models through the Rose tools, 
which save the model information in a Rose MDL file. From this MDL file, the prod- 
uct line relations extractor extracts product line relations, which correspond to the 
meta-classes in the meta-model. Through the product line relations extractor, a multi- 
ple-view model maps to product line model relational tables. Using these tables, the 
consistency checker checks for consistency of the multiple-view model by executing 
the consistency checking rules described in Section 4. After the product line engineer 
has produced a consistent multiple-view model, an executable model is developed 
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using Rose Real-Time. The Rose RT executable model is based on message commu- 
nication between active objects, which execute statecharts. Next, the product line 
dependent knowledge base is extracted by the Product line Dependent Knowledge 
Base (PLDKB) Generator for configuring target systems from the product line multi- 
ple-view model. 

In the target system configuration phase of PLUSEE (Fig. 6), a Knowledge Based 
Requirement Elicitation tool (KBRET) is used to assist the human target system re- 
quirements engineer to select the optional features for the target system through 
KBRET GUI. Once the features are selected, KBRET reasons about the fea- 
ture/feature dependencies to ensure that a consistent set of target system features are 
selected. Using the target system features, the Target System Relations Extractor 
extracts target system relations from the product line model database. The Target 
System Rose MDL File Generator uses the target system relations to generate a target 
system Rose MDL file. An executable target system is configured using the target 
system Rose MDL file. 

A Rose Real-Time executable model is a simulation of the target system, which is 
then executed and tested to determine whether the multiple-view model performs in 
accordance with the requirements. The main tools that were developed in PLUSEE 
described in the next section. 




Fig. 6. Target system configuration tools of PLUSEE 



A Multiple-View Meta-modeling Approach for Variability Management 283 



6 Product Line Engineering 

6.1 Multiple-View Product Line Relations Extractor 

The multiple-view product line relations extractor generates product line relations 
from the multiple-view product line model. Rose and Rose RT save a multiple-view 
product line model in ASCII MDL and RTMDL files, respectively. In these files, 
information about the multiple-view model is stored with keywords. These keywords 
are used for extracting the information relevant to the multiple views of a software 
product line from the Rose MDL and Rose RTMDL files. The product line relations 
extracted are stored in an underlying tabular representation of the multiple views, 
which are later used for consistency checking and target system configuration. The 
product line relations are tool independent. 

6.2 Product Line Model Consistency Checker 

The product line model consistency checker identifies inconsistencies between multi- 
ple views in the same phase or different phases. The rules for consistency checking 
between multiple views are checked against the product line relations extracted from 
the product line model. For example, the consistency checking rule in section 4, “each 
optional class in the class model must support only one optional feature”, is checked 
by the consistency checker using Optional Class relation ((a) of Fig. 7) and Optional 
Feature Class Dependency relation ((b) of Fig. 7), which are derived from the multi- 
ple-view model for the flexible manufacturing product line. The Optional Class rela- 
tion contains optional classes derived from the product line static model. The Op- 
tional Feature Class Dependency relation defines a dependency between an optional 
feature and an optional class supporting the feature. To check the rule, the consis- 
tency checker confirms that each optional class in the Optional Class relation supports 
only one optional feature in the Optional Feature Class Dependency relation. For 
example, if the consistency checker finds an optional class that supports more than 
one optional feature, a kernel feature, or no feature at all, it generates a consistency 
error message for this rule. 



6.3 Product Line Dependent Knowledge Base Generator 

The product line dependent knowledge base generator generates the product line 
dependent knowledge base from the product line relations. The product line depend- 
ent knowledge base contains information about classes, optional features, fea- 
ture/feature dependency, feature/class dependency, generalization/specialization rela- 
tions among classes), aggregation relations among classes), and feature sets. The 
product line dependent knowledge base is used by KBRET to select target system 
features from the available optional features. 
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(a) Optional Class relation 
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(b) Optional Feature Class Dependency relation 
Fig. 7. Product line relations for consistency checker 



6.4 Knowledge Based Requirements Elicitation 

The Knowledge Based Requirement Elicitation Tool (KBRET) is used to assist a user 
to select optional features of each target system. KBRET, which was developed in 
previous research [2], conducts a dialog with a human target system requirements 
engineer, presenting the user with the optional features available for selecting a target 
system. The user selects the features that will belong to the target system; KBRET 
reasons about feature/feature dependencies and then checks for feature set constraints 
such as mutually exclusive feature sets, exactly one-of feature sets, and one-or-more 
feature sets to resolve conflicts among features. Based on the selected features, 
KBRET determines the kernel, optional and variant classes to be included in this tar- 
get system. 



7 Target System Generator 

7.1 Target System Relations Extractor 

The target system relations extractor creates relations for a target system from the 
multiple-view product line relations. The goal is to tailor the product line multiple 
view model so as to configure a target system corresponding to the features selected 
for the target system. To extract target system relations, the extractor uses the optional 
and variant features that a user has selected through KBRET, as well as kernel fea- 
tures that are automatically selected for all product line members. 

7.2 Target System MDL Generator 

The target system MDL generator was developed to create the Rose MDL file for a 
target system. Using the target system relations, the target system Rose MDL gen- 
erator generates a Rose MDL file for a target system by changing the color of the 
modeling elements in the target system. A target MDL file for a target system is gen- 
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erated by changing the colors of target classes in the class model, target use cases in 
the use case diagram, target objects in the collaboration model, and target states in the 
statechart model. The changed color of target system multiple-view models (for ex- 
ample, yellow) is distinguished from the color of the original product line multiple- 
view model (for example, white). 



8 Conclusions 

Modern software modeling approaches, such as UML, provide greater insights into 
understanding and managing commonality and variability in software product lines. 
A multiple-view model of a software product line captures the commonality and 
variability among the software family members that constitute the product line from 
different perspectives. This paper has briefly described multiple-view modeling of 
software product lines before describing a multiple- view meta-modeling approach for 
software product lines in UML and an approach for consistency checking between 
meta-model views. The paper then went on to describe in detail the tool support for 
product line multiple-view meta-modeling, meta-model consistency checking, and 
product line member configuration from the product line architecture. 
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Abstract. Quality assurance (QA) tasks, such as testing, profiling, 
and performance evaluation, have historically been done in-house 
on developer-generated workloads and regression suites. Performance- 
intensive systems software, such as that found in the scientific computing 
grid and distributed real-time and embedded (DRE) domains, increas- 
ingly run on heterogeneous combinations of OS, compiler, and hardware 
platforms. Such software has stringent quality of service (QoS) require- 
ments and often provides a variety of configuration options to optimize 
QoS. As a result, QA performed solely in-house is inadequate since it 
is hard to manage software variability, i.e., ensuring software quality on 
all supported target platforms across all desired configuration options. 
This paper describes how the Skoll project is addressing these issues by 
developing advanced QA processes and tools that leverage the extensive 
computing resources of user communities in a distributed, continuous 
manner to improve key software quality attributes. 



1 Introduction 

Emerging trends and challenges. While developing quality reusable soft- 
ware is hard, developing it for performance-intensive systems is even harder. 
Examples of performance-intensive software include high-performance scientific 
computing systems, distributed real-time and embedded (DRE) systems, and 
the accompanying systems software (e.g., operating systems, middleware, and 
language processing tools). Reusable software for these types of systems must 
not only function correctly across the multiple contexts in which it is reused and 
customized - it must also do so efficiently and predictably. 

To support the customizations demanded by users, reusable performance- 
intensive software often must (1) run on a variety of hardware/OS/compiler 

* This material is based upon work supported by the National Science Foundation 
under Grant Nos. NSF ITR CCR-0312859, CCR-0205265, CCR-0098158. 
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platforms and (2) provide a variety of options that can be configured at compile- 
and/or run-time. For example, performance-intensive middleware, such as web 
servers (e.g., Apache), object request brokers (e.g., TAO), and databases (e.g., 
Oracle) run on dozens of platforms and have dozen or hundreds of options. While 
this variability promotes customization, it creates many potential system config- 
urations, each of which may need extensive quality assurance (QA) to validate. 
Consequently, a key challenge for developers of reusable performance-intensive 
software involves managing variability effectively in the face of an exploding 
software configuration space. 

As software configuration spaces increase in size and software development 
resources decrease, it becomes infeasible to handle all QA activities in-house. 
For instance, developers may not have access to all the hardware, OS, and com- 
piler platforms on which their reusable software artifacts will run. Moreover, 
due to time-to-market driven environments, developers may be forced to re- 
lease their software in configurations that have not been subjected to sufficient 
QA. The combination of an enormous configuration space and severe develop- 
ment resource constraints therefore often force developers of reusable software 
to make design and optimization decisions without precise knowledge of their 
consequences in fielded systems. 

Solution approach — > Distributed continuous QA processes 
(DCQA). To manage this situation, we have initiated the Skoll 
(www.cs.umd.edu/projec\discretionary{-}{}Ots/skoll) project to 
develop tools and processes necessary to carry out “around-the-world, around- 
the-clock” QA. Our feedback-driven Skoll approach divides QA processes 
into multiple subtasks that are intelligently and continuously distributed to, 
and executed by, a grid of computing resources contributed by end-users and 
distributed development teams around the world. The results of these executions 
are returned to central collection sites where they are fused together to identify 
defects and guide subsequent iterations of the QA process. 

Skoll QA processes are based on a client/server model. Clients distributed 
throughout the Skoll grid request job configurations (implemented as QA subtask 
scripts) from a Skoll server. The Skoll process is carried out as shown in Figure l 1 . 
At a high level, the Skoll process is carried out as shown in Figure 1. 

1. Developers create the configuration model and adaptation strategies. Devel- 
opers create the generic QA subtask code that will be specialized when creating 
actual job configurations. 

2 . A user requests Skoll client software via the registration process described 
earlier. The user receives the Skoll client software and a configuration template. 
If a user wants to change certain configuration settings or constrain specific 
options he/she can do so by modifying the configuration template. 

3. A Skoll client periodically (or on-demand) requests a job configuration from 
a Skoll server. 

4. The Skoll server queries its databases and the user-provided configuration 
template to determine which configuration option settings are fixed for that 
user and which must be set by it. 

1 A comprehensive discussion of Skoll components and infrastructure appears in [1], 
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5. A Skoll client invokes the job configuration and returns the results to the 
Skoll server. 

6. The Skoll server examines these results and invokes all adaptation strategies. 
These update the operators to adapt the global process. 

Model-Integrated DCQA techniques. Earlier work on Skoll described the 
structure and functionality of Skoll and presented results [1] from a feasibility 
study that applied Skoll tools and processes to ACE [2] and TAO [3]. The ini- 
tial Skoll prototype provided a DCQA infrastructure that performed functional 
testing, but did not address QoS issues, nor did it minimize the cost of im- 
plementing QA subtasks. In particular, integrating new application capabilities 
into the Skoll infrastructure (such as benchmarks that quantified various QoS 
properties) required developers to write test cases manually. Likewise, extending 
the configuration models ( e.g ., adding new options) required the same tedious 
and error-prone approach. 

This paper describes several previously unexamined dimensions of Skoll: in- 
tegrating model-based, techniques with distributed continuous QA processes , im- 
proving quality of service (QoS) as opposed to functional correctness , and using 
Skoll to empirically optimize a system for specific run-time contexts. At the heart 
of the Skoll work presented in this paper is CCMPerf [4], which is an open- 
source toolsuite 2 that applies generative model-based techniques [5] to measure 
and optimize the QoS of reusable performance-intensive software configurations. 
Currently, CCMPerrf in concert with Skoll, focuses on evaluating QoS of im- 
plementations of the CORBA Component Model (CCM) 3 , as shown in Figure 2. 

Paper organization. The remainder of this paper is organized as follows: 
Section 2 motivates and describes the design of CCMPerf, focusing on its model- 
based generative benchmarking capabilities; Section 3 describes a case-study that 
illustrates how QoS characteristics captured using CCMPerrf can be captured 
and fed back into models to analyze system behavior at model construction time; 
Section 4 examines related work and compares it with the approaches used in 
Skoll and CCMPerrf; Section 5 presents concluding remarks and future work. 

2 Enhancing Skoll with a Model-Based QoS Improvement 
Process 

Reusable performance-intensive software is often used by applications with strin- 
gent quality of service (QoS) requirements, such as low latency and bounded 
jitter. The QoS of reusable performance-intensive software is influenced heavily 
by factors such as (1) the configuration options set by end-users to tune the 
underlying hardware/software platform and (2) characteristics of the underlying 
platform itself. Managing these variable platform aspects effectively requires a 
QA process that can precisely pinpoint the consequences of mixing and matching 
configuration options on various platforms. 

2 ccccccc can be downloaded from www.dre.vanderbilt.edu/cosmic 

3 We focus on CCM since it is standard component middleware targeted for QoS 
requirements for DRE systems 
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Fig. 1. Skoll QA Process View Fig. 2. Skoll QA Process View with CCM- 

Perf Enhancements 



In the initial Skoll approach, creating a benchmarking experiment to measure 
QoS properties required QA engineers to write (1) the header files, source code, 
that implement the functionality, (2) the configuration and script files that tune 
the underlying ORB and automate running tests and output generation, and (3) 
project build files ( e.g ., makefiles) required to generate the executable code. Our 
experience during our initial feasibility study [1] revealed how this process was 
tedious and error-prone. The remainder of this section describes how we have ap- 
plied model-based techniques [5] to improve this situation. These improvements 
are embodied in CCMPerrf [4], a model-based benchmarking toolsuite. 



2.1 Model-Based Tools for Performance Improvement 

With CCMPerrf, QA engineers graphically model possible interaction scenar- 
ios. For example, Figure 3 presents a model that shows an association between 
a facet 4 and an IDL interface. It also shows the domain-specific building blocks 
(such as receptacles, event sources, and event sinks) allowed in the models. Given 
a model, CCMPerrf generates the scaffolding code needed to run the experi- 
ments. This typically includes Perl scripts that start daemon processes, spawn 
the component server and client, run the experiment, and display the required 
results. 

CCMPerrf is built on top of the Generic Modeling Environment (GME) [6] , 
which provides a meta-programmable framework for creating domain-specific 
modeling languages and generative tools. GME is programmed via meta-models 
and model interpreters. The meta-models define modeling languages called 
paradigms that specify allowed modeling elements, their properties, and their 

4 A facet is a specialized port a CCM component exposes for clients to communicate 
with the component. 
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relationships. Model interpreters associated with a paradigm can also be built 
to traverse the paradigm’s modeling elements, performing analysis and generat- 
ing code. CCMPerrf was developed by creating the following two paradigms: 
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Fig. 3. CCMPerf modeling paradigm 



Fig. 4. OCML Modeling Paradigm 



The Options Configuration Modeling Language (OCML) [7] paradigm 
models the various configuration parameters provided by ACE, TAO, and CIAO. 
Figure 4 shows ORB configuration options modeled with OCML. Inter-option 
dependencies are captured as model constraints. For instance, Figure 4 shows 
that when the demultiplexing option is set to active demultiplexing and the 
reactivation of system ids is disabled, the use of active hints is disallowed. Specific 
OCML models are then used to automatically generate large portions of Skoll’s 
QA subtask code. 

The Benchmark Generation Modeling Language (BGML) [4] paradigm 
models (1) how clients and servers interact with each other and (2) represents 
metrics that can be applied to specific configuration options and platforms. Fig- 
ure 3 illustrates the BGML modeling paradigm. 

We also developed one model interpreter for each of CCMPerrf’s two 
paradigms. The OCML model interpreter generates configuration files to config- 
ure the underlying middleware. For ACE, TAO, and CIAO these configuration 
files conform to a specified format that can be parsed by the middleware. The 
BGML model interpreter parses the model and synthesizes the code required to 
benchmark the modeled configuration. 

2.2 Integrating CCMPerf into the Skoll QA Process 

Figure 2 presents an overview of how we have integrated CCMPerrf with the 
existing Skoll infrastructure. 

A. A QA engineer defines a test configuration using CCMPerrf models. The 
necessary experimentation details are captured in the models, e.g., the ORB 
configuration options used, the IDL interface exchanged between the client and 
the server, and the benchmark metric performed by the experiment. 
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B &: C. The QA engineer then uses CCMPerrf to interpret the model. The 
OCML paradigm interpreter parses the modeled ORB configuration options and 
generates the required configuration files to configure the underlying ORB. The 
CCMPerrf paradigm interpreter then generates the required benchmarking 
code, i.e., IDL files, the required header and source files, and necessary script 
files to run the experiment. Steps A, B, and C are integrated with Step 1 of the 
Skoll process described in Section 1. 

D. When users register with the Skoll infrastructure they obtain the Skoll client 
software and configuration template. This step happens in concert with Step 2, 
3, and 4 of the Skoll process. 

E &: F. The client executes the experiment and returns the result to the Skoll 
server, which updates its internal database. When prompted by developers, Skoll 
displays execution results using an on demand scoreboard. This scoreboard dis- 
plays graphs and charts for QoS metrics, e.g., performance graphs, latency mea- 
sures and foot-print metrics. Steps E and F correspond to steps 5, 6, and 7 of 
the Skoll process. 

3 Feedback-Driven, Model-Integrated Skoll: A Case 
Study 

Study motivation and design. Measuring QoS for a highly configurable sys- 
tem such as CIAO involves capturing and analyzing system performance in terms 
of throughput, latency, and jitter across many different system configurations 
running on a wide range of hardware, OS, and compiler platforms. We treat 
this problem as a large-scale scientific experiment, i.e., we rely on design of 
experiments theory to determine which configurations to examine, how many 
observations to capture, and the techniques needed to analyze and interpret the 
resulting data. We use the CCMPerrf modeling tools presented in Section 2 to 
model configuration parameters and generate benchmarking code that measures 
and analyzes the QoS characteristics. Using the collected data, we derive two 
categories of information: (1) platform-specific, whose behavior differs on partic- 
ular platforms and (2) platform-independent, whose behavior is common across 
a range of platforms. This information can then be fed-back into the models to 
specify QoS characteristics at model construction time. 

To make this discussion concrete, we present a simple example of our ap- 
proach (in production systems these experiments would be much larger and 
more complex). This experiment measures only one aspect of QoS: round-trip 
throughput calculated at the client side as the number of events processed/ sec. 
We then use the OCML paradigm described in Section 2.1 to model the software 
configuration options that set the request processing discipline within the ORB. 
All other options are simply set to their default values. 

For this study, we modeled a leader/followers [8] request processing approach, 
where a pool of threads take turns demultiplexing, dispatching, and processing 
requests via a thread-pool reactor (TP_Reactor) [8]. In this scheme the following 
two configuration parameters can be varied to tune the QoS characteristics: 



292 



A.S. Krishna et al. 



— The thread-pool size determines the number of threads in the ORB’s 
TP_Reactor that will demultiplex, dispatch, and process requests and 

— The Scheduling policy determines how the threads take turns in process- 
ing requests. We use two scheduling policies for the experiment: (1) FIFO 
scheduling , where the thread that enters the queue first processes the request 
first, and (2) LIFO scheduling , where the thread entering last processes the 
request first. 

Execution testbed. We chose the following four testbeds with varying hard- 
ware, OS, and compiler configurations. Table 1 summarizes the key features of 
these four testbed platforms. 

Table 1 . Testbed Summary 





DOC 


ACE 


Lindy 


Tango 


CPU 


AMD 


AMD 


Intel 


Intel 


Type 


Athlon 


Athlon 


Xeon 


Xeon 


Speed (GHz) 


2 


2 


2.4 


1.9 


Cache (KB) 


512 


512 


1024 


2048 


Compiler (gcc) 


3.2.2 


3.3 


3.3.2 


3.3.2 


OS (Linux) 


Red Hat 9 


Red Hat 8 


Fedora Core I 


Dcbian 



Study execution. To identify the influence of scheduling policy and thread 
pool size on round-trip throughput, we conducted the experimental task that 
used two components communicating with each other. The Skoll system next 
distributed the experimental tasks to clients running on the four platforms. Each 
task involved 250,000 iterations. Skoll continued distributing the tasks until the 
entire experimental design was completed. For example, on the machine called 
DOC (See Table 1), different clients and servers executed every combination of 
FIFO and LIFO policies with the number of request processing threads set to 2 
and 4. Each combination can be categorized as a tuple (ti, t 2 ), where ti denotes 
number of threads used on the client and O number of threads used on the 
server. 

Study analysis. The Skoll infrastructure provides the framework for automat- 
ically collecting data that can then be analyzed to glean platform-specific and 
platform-independent information. The experimental results shown in Figure 5 
were obtained by using Skoll to run experiments on each platform described in 
Table 1. Our analysis of the results in Figure 5 yielded the following observations: 



• Observation 1 . On average, LIFO scheduling yielded higher throughput 
than FIFO scheduling strategy. Our analysis suggests that this occurs because 
the LIFO strategy increases cache thread affinity, leading to the cache lines not 
being invalidated after every request. This observation holds for all the platforms 
we conducted the experiment, though it is obviously influenced by the underlying 
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Fig. 5. Results Summary for Study Execution 
Table 2. Platform-specific Information 



Machine | Tuple 


FIFO 


LIFO 




Throughput (events/sec) 


DOC 


(4,2) 


10,470 


9354 


ACE 


(4,2) 


10,152 


10,087 


Lindy 


(4,2) 


11,057 


9982 


Tango 


(4,2) 


10,196 


10,067 




(6,2) 


14,696 


12,556 



hardware, OS and compiler platforms. Assuming our testbed platforms as a 
complete configuration space, this information is platform-independent , i.e., the 
LIFO strategy yields higher throughput when the leader/followers model for 
request processing is chosen. Such information helps developers understand the 
first-order effects of different configuration options. 

• Observation 2. While observation 1 holds as the general case, we detected 
finer effects, as shown in Table 2. Also, for specific test cases, FIFO produces 
higher throughput than LIFO. In particular, whenever the number of server 
threads is low (i.e., 2) FIFO performs as well or better than LIFO. Moreover, 
the degree of improvement increases as the number of client threads increases. 
This platform-specific information holds only for certain configurations in our 
configuration space, which consists of the hardware, OS, compiler and software 
configuration options. 

Lessons learned. Although this feasibility study was purposely simplified, it 
indicates how the model-integrated Skoll framework enables more powerful ex- 
periments that help identify performance bottlenecks and provide general guide- 
lines to developers and users of software systems. The platform specific and inde- 
pendent information help in developing re-usable configurations that maximize 
QoS for a given operational context. The formal designed experiment illustrated 
how our Skoll framework can be used to codify these re-usable configuration 
solutions. These configurations when validated across a range of hardware, OS 
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and compiler platforms represents a Configuration & Customization (C&C) [9] 
pattern. 



4 Related Work 

This section compares our work on model-driven performance evaluation tech- 
niques in Skoll and CCMPerrf with other related research efforts that use 
empirical data and mathematical models to identify performance bottlenecks. 
For example, the ATLAS [10] numerical algebra library uses an empirical opti- 
mization engine to decide the values of optimization parameters by generating 
different program versions that are run on various hardware/OS platforms. The 
output from these runs are used to select parameter values that provide the 
best performance. Mathematical models are also used to estimate optimization 
parameters based on the underlying architecture, though empirical data is not 
fed into the models to refine it. 

Like ATLAS, CCMPerrf uses an optimization engine to config- 
ure/customize middleware parameters in accordance to available OS platform 
characteristics (such as the type of threading, synchronization, and demultiplex- 
ing mechanisms) and characteristics of the underlying hardware (such as the type 
of CPU, amount of main memory, and size of cache). CCMPerrf enhances AT- 
LAS, however, by feeding back platform-specific information into the models to 
identifying performance bottlenecks at model construction time. This informa- 
tion can be used to select optimal configurations ahead of time that maximize 
QoS behavior. 

Other research initiatives to validate if software components meet QoS, in- 
clude automatic validation techniques [11] for J2EE components using Aspects. 
In this approach, agents at run-time, conduct validation tests such as, single- 
client response time, functional operation and data storage/ retrieval tests. Our 
approach on Skoll, differs from run-time validation as all our testing is done 
offline. Using our approach, no cost is incurred at deployment time for the com- 
ponent. Further, extensive testing and QoS behavior analysis is done on a range 
of hardware, OS and compiler platforms to identify performance bottlenecks by 
modeling the operational context and synthesizing scaffolding code. The results 
are then used to select optimal configuration at deployment time rather than 
incur overhead of run-time monitoring. 

5 Concluding Remarks 

Reusable software for performance-intensive systems increasingly has a multi- 
tude of configuration options and runs on a wide variety of hardware, com- 
piler, network, OS, and middleware platforms. Our work on Skoll addresses two 
key dimensions of applying distributed continuous QA processes to reusable 
performance-intensive software. The Skoll framework described in [1] address 
software functional correctness issues, e.g ., ensuring software compiles and runs 
on various hardware, OS, and compiler platforms. The CCMPerf tools described 
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in this paper address software QoS issues, e.g., modeling and benchmarking in- 
teraction scenarios on various platforms by mixing and matching configuration 
options. These model-based QA techniques enhance Skoll by allowing developers 
to model configuration/interaction aspects and associate metrics to benchmark 
the interaction. These techniques also minimize the cost of testing and profiling 
new configurations by moving the complexity of writing error-prone code from 
QA engineers into model interpreters, thereby increasing productivity and qual- 
ity. Model-based tools such as CCMPerrf simplify the work of QA engineers 
by allowing them to focus on domain specific details rather than write source 
code. 
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Abstract. Using explicit tags at the programming language level has 
been attracting a lot of attention recently with technologies like xDoclet 
[41] which automates EJB [30] related tasks, and .NET attributes [21] 
which are a vital part of the .NET framework [34]. However there are 
currently no systematic ways for adding and transforming tag-driven 
product line specific constructs. This results often in implementations 
which are not very modular and hence that are difficult to reuse. 

In this paper we introduce Tango a specialized language for implementing 
source to source tag-driven transformers for object-based languages in a 
systematic way using several layers of abstraction. Tango operates on 
a fixed, language independent, object-based metamodel and divides the 
transformation strategy in several well-defined layers which makes it also 
possible to reuse lower level parts. Tango uses the so-called inner tags 
to communicate semantics between different transformer modules in a 
uniform way. 



1 Introduction 

Using tags is an intuitive way to associate custom semantics with elements of 
interest. Variations of the tag concept have been used all around in computer 
science. Explicit tags offer a convenient way to quickly customize a language 
with domain specific constructs reusing its existing features and front-end com- 
piler tools [39]. The programmer does not need to know the way the grammar 
rules evolve, which makes it easy to introduce domain specific customizations for 
application families, or proofs of concepts, in a uniform way. This is preferable to 
extensible grammar [24] approaches, which allow a grammar to evolve by adding 
new constructs which are then mapped to the original kernel grammar via add, 
update, and delete operations on existing production rules. 

We will use the term tagged grammars for language grammars designed to 
support explicit tags directly as part of the language. Several general purpose 

* Mira Mezini, Sven Kloppenburg, and Klaus Ostermann provided useful comments 
and suggestions about the early drafts of this paper. The anonymous reviewers com- 
ments also proved very useful for producing the final version. 
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languages like .NET [34] and Java (JSR 175 [40]) already offer such support. 
Tags can be associated with any existing structural element in these languages, 
like classes and methods. They are part of the structural element representation 
in the AST. Some language technologies, like .NET with its CodeDom API 1 , 
offer supported programmatic access to the annotated source via a generalized 
AST, which can be used to write pre-processing tools that deal uniformly with 
explicit tags interpretation 2 . For language technologies that do not offer explicit 
tag support, tags can be emulated with special comments. 

There are however several drawbacks when using tags which relate to the 
nature of automatic transformations. Usually the semantics of a tag are unclear 
unless we have full documentation about the tag, that is to know exactly how 
the tag-driven transformer will handle the tag. The side-effects that a tag trans- 
former introduces also present a problem. Despite the benefits of using explicit 
tags, these problems make it difficult to use tags in cases when interactions of 
transformers [35] must be taken into account. It is important then to imple- 
ment tag transformers in ways that allow grasping their transformation strategy 
quickly, so it becomes easier to understand and reuse them. 

There are several good general purpose transformation frameworks like Strat- 
ego [10], DMS [13] and TXL [19] which could be used to address domain specific 
transformations. These general purpose frameworks have an open metamodel 
meaning that, we can map any specific language to them. In this context such 
tools could also be used with explicit tags. However the generality of these frame- 
works offer no way to abstract the transformation strategy according to a domain 
of interest. If the domain is fixed then the chances of abstracting the transfor- 
mation strategy in different levels grow. We introduce here Tango 3 a specialized 
framework for implementing source to source tag-driven transformers. Tango 
supports only languages whose metamodels conform to a generalized object- 
oriented (00) class based metamodel with explicit tag annotations. Not every 
transformation is possible with this model. However Tango’s common metamodel 
is useful for many transformations that appear in context of tag specialized con- 
structs for product line [4] applications. 

2 Tango Framework Concepts 

Tango framework is designed for quickly adding tag-based product line [18] do- 
main specific constructs to existing object-oriented (00) languages, reusing their 
core functionality. 

Fixed Metamodel. Using tags to decorate class entities is similar in dif- 
ferent 00 languages. Tango achieves language independence using two concepts 
which encapsulate language dependent details: 

1 A third party implementation for C# is CS CodeDom Parser [14]. 

2 Alternatively tags are saved as part of metadata and could be later processed using 
reflection like API-s. 

3 The name Tango was coined in the following way: tag — » tag-go — » taggo — > tago — > 
tango. 
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First, all Tango transformers work with a unified internal meta-graph model 
of input source, called Class Template model. It is a generalized and tag an- 
notated common model of the structural AST for a generic 00 language. The 
model is focused on the class construct and is restricted on the structural el- 
ements it can contain. Currently it supports only classes, field attributes and 
methods. This model has been used to define product line specific constructs 
in MobCon [29], a framework or mobile applications. How a specific language 
source code is mapped to this model is not addressed directly by Tango. Such 
mappings can be complicated to implement for a given language, but we can 
take advantage of the fact that tag-driven transformations are often used spar- 
ingly in a project only in well-defined component structures. Only the parts of 
the metamodel needed by the transformations could be mapped, ignoring for ex- 
ample full namespace support. Advanced users may extend the framework and 
change its metamodel by mapping new language features to it. 

Second, Tango’s abstractions are isolated from the rest of the transformer 
implementation via the concept of a ’code snippet’ 4 , which is similar to Strat- 
ego’s [10] concrete syntax. A code snippet is a source code node atomic to Tango. 
It is the only part of code which is language grammar dependent. Tango requires 
that the implementation has a way to replace parameters inside the code snip- 
pet using templates. An example of a template code snippet that uses Apache 
Velocity [20] script language is shown in Fig. 1. This code generates the body of 
method toRecord in Fig 4. The details of Velocity language are however outside 
the scope of this paper. Code snippets allow representing parameterized clusters 



code StoreFields ($f ieldArray) language (Velocity) -( 

ByteArrayOutputStream baos = new ByteArrayOutputStreamO ; 
DataOutputStream outputStream = new DataOutputStream(baos) ; 
#foreach($f ield in $ fieldArray) 
outputStream . $dp_wr ite ($f ield . type) 

(this . $Tango .makeMethodName( ["get" , $f ield. name] 0 ) ) ; 

#end 

return baos . toByteArray () ; 

#macro(dp_Write, $type) 

if ($type == string) UTF 
if($type == int) Int 

#end 

> 



Fig. 1 . Code Snippet Example 



of source graph nodes without dealing with details of such nodes in the rest of 
transformer implementation. Similar approaches include syntactic unit trees [26] 
and various template and macro based approaches [15,16]. Tango code snippets 
differ from such approaches because they are restricted in representing only non- 
structural elements for example a block of code inside a method. They can also 
be labeled with inner tags (see below) and reprocessed later as atomic elements 
of the AST. 

This term is borrowed from .NET CodeDom API [31]. 
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Controlling Semantics with Inner Tags. Tango introduces the concept 
of inner tags which are a natural extension of explicit tags. They are used only 
in the inner operations of tag-driven transformers and offer convenient means 
for specifying the coupling sets between composed transformers by removing the 
need to reinterpret the code. Inner tags are similar to ASF+SDF [25] placeholders 
for saving intermediate results. However inner tags offer a generic uniform model 
of controlling custom semantics integrated uniformly with the rest of tag-driven 
transformer operations in Tango. Programmers deal with inner tags in the same 
way they process explicit tags. All Tango’s basic edit operations can specify inner 
tags to the entities they modify. 

While inner tags place inter-transformer interaction semantics, we should 
note that the alternative is to re-process the AST in each transformer to see if 
it fulfills a given condition. Using inner tags does not grow transformer coupling 
which remains the same. Inner tags only make the coupling declarative, avoid- 
ing reprocessing. In this context inner tags are used to create arbitrary graph 
node sets [28] , which may not correspond directly to the generalized AST graph 
nesting structure. With inner tags we are also able to associate more than one 
label with a node (group) and select the node in more than one set. 

Layers. Tango’s fixed metamodel allows dividing the transformation strategy 
in several layers corresponding to the elements of the metamodel. This allows 
reasoning in different levels of abstraction when we want to understand the 
transformation strategy. A transformer implementation in Tango is organized in 
several hierarchical layers: the workflow layer , the transformers layer , the member 
constructors and the code snippets layer as show in Fig. 2. Each layer uses the 



Workflow 




Transformers 




Member Constructors 




Code Snippets 


I 



use(s) 



Fig. 2. Tango Framework Layers 



elements of the successive lower layer, but cannot create elements of the upper 
layer. For example class templates are only created in the workflow layer. The 
transformer layer can only modify the class templates but not create new ones. 
This way we can know what class templates take part into a transformation only 
by examining the workflow layer. In the same trend member constructors (see 
Section 3) are used in the transformers layer only. Finally code snippets are used 
only by member constructors. 

Each layer defines part of the transformation strategy customized to its ele- 
ments. For example the workflow strategies are similar to Stratego’s [10] trans- 
formation strategies but simpler. Unlike Stratego which is a general purpose 
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transformation framework Tango’s workflow operates only on class templates. 
The simplicity is a consequence of the specialization for class layer of the meta- 
model. Strategies of the other lower layers also correspond roughly to some 
Stratego [10] transversal strategies. 

Traceability. Traceability is integrated in the Tango framework. It can be 
turned on and off for pieces of source code of interest by using a special explicit 
tag directly in source code. When present, this tag instructs the framework to 
add special log methods to all methods that are transformed by any Tango 
transformer. The log methods are decorated with special tags containing all 
the transformer names which have edited the method. When output code is 
executed, the trace statements are printed on the console, allowing to know for 
each executing method its exact transformation history. This centralized tracing 
capability helps debugging transformation related side-effects. 



3 A Feeling of Tango: Transformation Example 

In this section we introduce transformer implementation in Tango using the 
example of Fig. 3 which will be transformed to the code of Fig. 4. This example 
is a tag based implementation of the standard GameScore example which comes 
with J2ME MIDP [6] documentation. The input code fields are decorated with 
explicit tags in forms of JavaDoc [33] comments. Tags will be part of generalized 
AST field elements. The following attribute classes have been used in the code 
of Fig. 3: (a) property - adds accessor / mutator methods for a field (b) validate 
- adds validation code for fields that have mutator methods; min, max show the 
required range for an integer or the required length ranges for a string field (c) 
dp - adds data persistence methods to the component and allows the records to 
be retrieved sorted. For more details see [29]. 

The details of the specific Tango syntax shown here may change as the frame- 
work is still being developed. Check [36] for the latest version. 

The Workflow. The workflow layer defines the structure of a compos- 
ite transformer at the class level. The workflow operations work only with 
class templates and transformers. Tango transformers are functions of form r : 
< ctn, ...,cti n > — > < ct 0 i , ..., ct om >, that is they take one or more class 
template arguments and output one or more class templates. This allows us 
to compose transformers uniformly, unlike other systems like JMangler [12] 
that make a distinction between individual transformers and mergers. Tango 
transformation workflow for the example above is listed in Fig. 5. The class 
template is initialized from a source file in line 2. The implementation of the 
transformer is split based on the three classes of attributes that are present: 
Property, Validation, Persistence. The composition operator (’,’) is used in line 
3. This composition is equivalent to the following functional representation: 
’$CT = Persistence (Validation(Property ($CT) ) ) which can also be used 
directly. The implementation of transformers is read from file ’example, tgt’ (line 
1). The transformed class template is saved to source code in line 4. 
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/** 

* ©dp 

* ©validate 

* ©property 
*/ 

public class GameScore { 

/** 

* ©dp.pk 

* ©validate .min 0 

* ©validate .max 100 

* ©dp. sort asc 

* ©property . accessor 
*/ 

private int score; 

/** 

* ©dp.pk 

* ©validate .min 4 

* ©validate .max 32 

* ©dp. sort asc 

* ©property .both 
*/ 

private String playerName; 

/** 

* ©property .both 
*/ 

private String Comment; 

> 

Fig. 3. Input Code 



1. ©using " example. tgt" 

2 . $CT = read ( " input . j ava" ) ; 

3. $CT = Property, Validatio: 

4. write ($CT, "output . java") 



public class GameScore { 
private int score; 
private String playerName; 
private String Comment ; 

public getScoreO { return score; } 

public String setPlayerName ( 
string value) 

{ playerName = value ; } 

// ... 

public byte[] toRecordO 

ByteArrayOutputStream baos = 

new ByteArrayOutputStreamO ; 
DataOutput Stream outputStream = 
new DataOutputStream(baos) ; 
outputStream . wr itelnt (o . getScore () ) ; 
outputStream . wr iteUTF ( 
o . getPlayerName ( ) ) ; 
outputStream . wr iteUTF ( 
o . get Comment () ) ; 
return baos . toByteArrayO ; 

> 

// ... 



Fig. 4. Output Code 



, Persistence; 



Fig. 5. Example Workflow File 



Other supported workflow operations not shown in this example include: the 
try operation ’$CT = ?(T1, T2 I T3);’ - where ’T3’ is tried only if ’Tl, T2’ 
fails; the creation of empty class templates ’$U = CT( "Utility") - creates a 
new class template named Utility, and cloning: ’$CT1 = CT2’ - that sets CT1 
to be a deep copy (not a reference) of CT2. 

Transformers. Fig. 6 shows the complete implementation of the Property 
transformer in Tango. Each transformer has a precondition part and an action 
part. The editing operations are allowed only in the action part. The split is 
intended to specify in the beginning the main checks whether a transformer 
can be applied or not. The noapply operator (line 4) tells Tango workflow that 
this transformer cannot be applied. If a transformer fails to apply the control 
is returned to the workflow. Theoretically we can define a precondition and a 
postcondition for any graph rewriting operation [28], but in practice it is cum- 
bersome to enumerate them for each operation (postconditions can be written 
as preconditions [28]). Tango allows factorizing some preconditions before all the 
actions, but this convenience results in an optimistic precondition requirement 
that is, the action part can also make more specific checks and the transformer 
may still fail as a consequence of a failed condition in the action part. 
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1. transformer Property (~ct) { 

2. precondition { 

if (not check(~ct, tags ( ["property"] )) ) 
noapply ; 

> 

action { 

$fields = select (~ct, FIELDS, tags ( ["property.*"] ) ; 
if (check ($fields , emptyO)) error; 
iterator ($field in $fields) -[ 

if (check($f ield, tags( ["property .accessor"] ) 
or tags ( ["property .both"] )) ) 
add(~ct , METHOD, GetMethod($f ield) , 

[tag(<"property .accessor" , $f ield.name>)] ) ; 
else if (check($f ield, tags ( ["property .mutator"] ) 
or tags( ["property. both"] ))) 
add(~ct , METHOD, SetMethod($f ield) , 

[tag (< "property .mutator . " , $f ield.name>)] ) ; 



3 . 

4 . 

5 . 

6 . 

7 . 

8 . 
9 . 

10 . 

11 . 

12 . 

13 . 

14 . 

15 . 

16 . 

17 . 

18 . 

19 . 

20 . 
21 . 



> 

return ~ct; // optional the first argument is returned 
> 



Fig. 6. The Property Transformer Implementation 



The argument of the transformer is a class template (~ct) (line 1). The actions 
part will modify this class template to add getter and setter methods for all fields 
decorated with some form of property tag. The fields are selected in line 7 and 
stored in (list) variable $fields. The generic select operation is used to filter a 
set of nodes of the same type that fulfill a given condition. Only predefined types 
of the supported metamodel like FIELDS , METHODS etc, can be used. The third 
argument of select is a condition. Several predefined conditions are supported: 
’tag’ - filters nodes based on tags, ’name’ - filters based on names, ’empty’ - checks 
for empty list, and ’count’ - checks the number of list items. For conditions that 
expect string arguments, regular expressions are supported. Users can define 
additional custom conditions. 

The check operation (line 8) is similar to select. It applies all the conditions 
given to it to the first argument and returns a boolean value indicating whether 
conditions have succeeded or not. The individual conditions can be combined 
using boolean operators: and, not, or. The check and select operations pass 
the first argument implicitly to all conditions. This makes it easier to understand 
what a check or select statement does, given that all conditions work on the first 
argument of check or select, at a cost of a slightly slower implementation. 

The iterator operation (line 9) is used to apply an operation over all elements 
of a list. Note that iterator and select could be a single operation, however 
it makes sense to separate these operations in cases when we need to process 
lists, other than those returned by select. We have divided the iterations over 
the metamodel AST between the ’check’, ’select’, and ’iterate’. The check and 
select operators do implicit iterations over finite lists of elements. Again, this is 
preferable because it makes it easier to understand what a loop does, at the cost 
of slightly less efficient implementation. 

The add operation (line 12, 16) is an example of an edit operation. 
It adds a met a element to the class template given as its first argument. 
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GetMethod and SetMethod are member constructors (see below). All sup- 
ported edit operation in the transformers layer (add, delete, modify), work upon 
class templates and use member constructors of the lower layer. In the ex- 
ample the add operation also adds an inner tag to the newly added element: 
[tag (< "property . accessor" , $f ield.name>)] . The tag is created by the ex- 
plicit tag constructor tag, combining the field tag type and the field name using 
the string concatenation operation <. . .>. This inner tag can be used later in a 
select or check statement of another transformer, in the same way as an explicit 
tag. 

Note the pattern used in add and select. We require the type of meta elements 
to be given explicitly. In the case of add, the type could have been deduced from 
the member constructor type. We do this because it improves the readability of 
the code. An alternative would be to have distinct operation names for each meta 
element type, like addMethod. However this would require to maintain the Tango 
parser if its fixed metamodel is ever customized. The modified class template is 
returned in line 20. For an implementation of the two other transformers of this 
example see [36]. 

Member Constructors Layer. This layer defines the actual metamodel 
member implementations that are used in the transformers layer. Class templates 
cannot be passed as arguments to the member constructors. Currently Tango 
defines member constructors for fields, methods and tags. Method bodies are 
represented as a list of tag decorated code snippets divided into three logical 
blocks: begin, middle and end (Fig. 7). When an existing method is parsed in 
code, its method body is represented as a single code-snippet node in the middle 
block. Inner tags can be used to decorate snippets of code. Users can select 




f begin f | middle | | end f 

Fig. 7. Method Body as a List 



1. method ToRecord($f ieldArray) { 

2. methodName(makeMethodName( 

["toRecord"] ) ) ; 

3 . methodReturn (Byt eArr ay ( ) ) ; 

4. addBody (StoreFields ($f ieldArray) ) ; 

5. > 



Fig. 8. Method Constructor Example 



the elements by their tags and insert new or remove existing code snippets in 
any block part of the method body list. Fig. 8 shows a member constructor 
for a method used in the persistence transformer. No direct code is written in 
this layer, instead, code snippet constructors like ByteArray and StoreFields are 
called (lines 3, 4). The method name for a newly created method is set in line 
2 and its return type is set in line 4 (as a string of code returned by a code 
snippet). The method body is produced by code snippet constructor StoreFields 
(Fig. 1) . The addBody operation adds this code snippet at the end of the middle 
block. 
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4 Related Work 



Variations of tags have been used in many areas in computer science including 
transformation tools. Tags can be used to reduce [5] transformation search space 
[22]. Here we are interested in explicit tag usage at the source code level, as a 
reusable way to introduce product line domain specific extensions to existing 
general purpose 00 languages. 

Tango reflects previous experience with a tag-based framework implemented 
as part of MobCon [29], a specialized generative framework for Java 2 Micro 
Edition, Mobile Information Device Profile (J2ME MIDP) [6]. Several concepts 
generalized in Tango like the fixed metamodel and inner tags were used in Mob- 
Con use case. Tango adds a layered structure and a specialized syntax to ease 
using such constructs. 

Tango’s relation with generic open transformation systems like Stratego [10], 
TXL [19], DSM [13], ASF+SDF [25] was stated in several places in this paper. 
Unlike such open AST systems, which can be seen as open generalized compilers, 
Tango works with a fixed tag annotated object-based metamodel. How specific 
metamodel mappings are implemented is not directly addressed by Tango. Tango 
also divides the transformation strategy in several abstraction layers. 

Multi-stage programming [39] approaches use special annotations to deal 
uniformly with multi-stage transformations. These approaches are more driven 
by the need for optimizations and have a fixed set of well-defined annotations. 
Tango on the other hand, can be used to implement invasive transformers [3] 
at a higher level using a Generalized and Annotated AST (GAAST) [37] repre- 
sentation of the code, which is an AST-like API that preserves the annotations 
done at source level and allows accessing them programmatically. GAAST tools 
belongs to the category of API based generators [27]. 

Other approaches that generalize some of the ways a transformer works with 
an AST-like representation: (a) Filtering related approaches like JPath [32] that 
are motivated by XPATH [42] node query language for XML [2]. The idea is to 
be able to define XPATH like queries over the AST to declaratively select sets of 
nodes of interest. For example the query ’//class [@name="Classl"] /Methodl’ 
selects and returns all methods named Methodl found in class named Classl ; (b) 
Mapping related approaches which build upon filtering related approaches and 
that are motivated by XSLT [8] and more recently by MOF Query View and 
Transformation [9] proposals. These approaches define declaratively how to map 
a set of selected nodes to another set of nodes, enabling this way transformation 
of one graph to another. These approaches are very general and can be used to 
implement Tango on top of them. 

According to feature based categorization of MDA [7] transformation ap- 
proaches given in [23], Tango falls into a hybrid of code-to-code and template 
transformer with source-oriented deterministic iterative rules. However given 
that the transformation of a marked model to marked code is trivial, the ap- 
proach can be seen also as model-to-code. The categorization in [23] is however 
too general, and many tools could fall into the same category. Tango is unique 
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for its well-structured layers, extensibility of primitive operations, and heavy 
reliance on inner tags. 

Tango can be seen as a domain specialization of graph rewriting systems [28]. 
We can think of changes that a tag-driven transformer introduces to an anno- 
tated class as series of primitive operations. Given that the number of structural 
elements in a general purpose language is limited, it makes sense to enumerate 
such operations. If we have a notation for such operations and write a sequence 
of such high-level operations, we will end up with a specific language for imple- 
menting tag-driven transformers. The approach is similar to implementing graph 
based schema evolutions [17], but specialized for 00 tag-driven transformers. 

Aspect Oriented Programming (AOP) [11] deals with ways to modularize 
crosscutting concerns systematically. Tags can be seen as a way to specify join- 
points. Selecting tags in a language that supports them can be done with an 
aspect-oriented enabled tool in the same way as choosing other code elements. 
However custom tags enforce an explicit programming [1] model. They are used 
conventionally similar to method calls. This is not a limitation in our case be- 
cause we use tags to implement domain specific extensions that result in gener- 
ative mappings back to the core language. 



5 Conclusions and Future Work 

Custom explicit annotations offer an attractive way to introduce product line 
[18] domain specific constructs uniformly to general purpose languages, reusing 
their front-end compiler tools. This convenience has however several problems 
related to understanding transformations of the code decorated with tags. The 
answer to understanding the implementation relies on being able to reason about 
the strategy in various levels of abstraction that correspond to the language 
model. Tango is a specialized framework for addressing these problems for tag- 
driven transformers that work with a fixed object-based metamodel. Tango uses 
layering and restrictions of several operations to promote understandability over 
efficiency. This not only grows chances of individual transformer reuse, but also 
enables reuse of elements of lower layers in more than one element of the more 
abstract upper layers. Tango’s inner tags allow treating transformer semantics 
in a uniform way during composition. 

Future work will be concentrated on finalizing Tango grammar and improving 
the prototype. Possible areas of future work are: expressing and checking tag 
dependencies declaratively [38] and introduction of non-intrusive transformers. 
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Abstract. Object-oriented frameworks are sophisticated software artifacts that 
significantly impact productivity when building applications in a given domain. 
However, frameworks are complex and hard to master, and it remains an open 
problem to find a cost-effective solution for documenting them. This paper pre- 
sents the case-based approach of FrameDoc to framework documentation and 
reuse. By means of explicit knowledge representation and CBR, FrameDoc as- 
sists a novel user of the framework in the process of building new applications, 
by maintaining a case base of previous framework instantiations where relevant 
past cases can be retrieved and reused. The approach proposes both a method- 
ology for framework documentation, and a tool that helps the user when reus- 
ing a framework through the knowledge represented in the documentation. 



1 Introduction 

A framework is a reusable, semi-complete application that can be specialized to pro- 
duce custom applications. Frameworks promote reusability by exploiting the domain 
knowledge and prior effort of experienced developers in order to avoid recreating 
common solutions to recurrent application requirements [10]. Frameworks are com- 
plex, and one of the biggest problems with most frameworks is just learning how to 
use them. In order to achieve the highest degree of reusability and extensibility, 
frameworks are built as sophisticated software artifacts and, therefore, it is not easy to 
understand the design concepts, commitments and decisions involved in the solutions. 

FrameDoc is a tool whose overall goal is to alleviate the aforementioned learning 
effort by means of explicit knowledge representation and Case-Based Reasoning 
(CBR) [12]. We explicitly represent knowledge about the framework domain as well 
as the framework design and implementation, and apply CBR techniques to manage 
the experience acquired in the usage of the framework. 

The rest of the paper runs as follows: next Section describes the explicit models 
underlying FrameDoc documentation; Section 3 describes how FrameDoc helps in 
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obtaining those models, and Section 4 presents case-based framework instantiation. 
Finally, conclusions, and future work are presented. When needed, examples are 
given from the application of FrameDoc to JHotDraw, a two-dimensional graphics 
framework for structured drawing editors that is written in Java. 



2 Domain Model and Framework Design Model 

FrameDoc documentation comes in the form of a set of instantiation cases repre- 
sented in terms of a domain model connected to a model of the framework design. 
Previous work proposes a methodology for deriving frameworks from domain models 
[3], what proves the connection between domain modeling and framework develop- 
ment. Although some frameworks are shipped with a good domain model description, 
in general, the implicit domain analysis made by framework developers get lost in the 
resulting artifact [5]. Therefore, in order to apply FrameDoc, domain and framework 
models must be reverse engineered from the available documentation and source 
code. Next section describes this process, while next paragraphs present the models. 

In the last years, some effort has been spent on finding solutions for representing 
Domain Analysis information, from domain languages to knowledge taxonomies [6]. 
However, instead of a special purpose language, FrameDoc employs UML as a multi- 
purpose formalism to represent both domain and design information. 

A domain model must be able to represent entities, their attributes, the relations 
between them, and the points of variability in the domain. This information can be 
represented through UML class diagrams, by extending classes, methods and attrib- 
utes with the stereotype «variable» [13]. Tagging an element as variable implies 
that this element is optional in the applications within the domain. On the other hand, 
not tagged elements are mandatory in every domain application. For example, in the 
drawing editor domain. Figure entity is not a variable piece, since all drawing editors 
work with figures, whereas Selection class (entity) must be tagged as variable because 
some editors do not use it. In the same way, Figure may have an attribute called 
shape not variable and an attribute called connectable , tagged as variable. 

Regarding framework design, UML just cover one part of the design, giving no 
support to framework artifacts such as hot spots. In FrameDoc we use stereotypes in 
class diagrams to represent additional information about the design of the framework. 
This UML extension is based on UML-F [7]. 

In object-oriented frameworks there are two types of classes: «application» 
and «base» classes. Application classes are those classes that only appear in 
framework instantiations, and base classes are the core classes of the framework. 
Within base classes we can further distinguish between classes for which functional- 
ity should be extended and classes for which functionality should be modified. We 
stereotype these classes as «extensible» and «moclifiable» classes, respec- 
tively. Finally, class methods that are overridden in subclasses are tagged as 
«variation-method» because, in general, those are the points that the framework 
designers arranged for functionality extension. 



310 



C.J. Fernandez-Conde and P.A. Gonzalez-Calero 



DOMAIN MODEL | DESIGN MODEL 




Fig. 1 . Domain and Design Model. 



A key point of the framework model is the connection between domain and design 
models. We represent this connection through a UML association tagged as imple- 
mented-by. This relation indicates that some class, method or attribute from the 
framework design implements some domain functionality or entity. For example, 
suppose that the domain analyst identifies that all drawing editors work with figures 
and that these figures are classified as Lines, Polygons and Texts. In a particular 
framework such as JHotdraw there are several classes to represent figures but it could 
be useful to know at a glance how these classes implement the domain entities. Figure 
1 shows a partial view of domain and design models including their connections. 

We use Description Logics (DL) [4] as the underlying representation formalism: a 
TBox bringing all the necessary expressiveness to describe extended UML models, 
and an ABox for actual framework models. 



3 Interactive Domain Modeling in FrameDoc 

Starting from an existing framework, building a model of the domain, a design model 
and finding the relations between them consist of: identifying the domain entities, 
finding the relations between them, reverse engineering the framework design, identi- 
fying the points of variability of the framework, and finally mapping domain entities 
into the actual design. Doing these hard tasks does not guarantee success, since our 
point of view in domain analysis could be far from the developers point of view. 
FrameDoc supports interactive model construction providing knowledge extracted 
from the framework, and, in this way, the resulting domain model should be close to 
the model implicit in the framework. 

The user activities are divided into two main categories, the activities focused on 
building the domain model, and the activities dedicated to refine the model and relate 
it to the framework internal model. In order to facilitate these activities, the screen of 
FrameDoc domain modeler (see Figure 2) is separated in four different sections: a) 
domain model browser to see and modify the model that the user is building, b) do 
main entity documentation viewer to provide quick access to this information, c) 
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Fig. 2. Main screen of FrameDoc domain modeler. 

name hierarchy browser, and d) framework class browser. Along with the main 
screen, FrameDoc has some dialogs to enable the definition of domain classes, meth- 
ods, attributes and relations in an easy way. 

Building the initial domain model: To accomplish this task, the user has to iden- 
tify relevant classes (concepts) that appear in the domain. The user can ask for the 
most used terms in the framework, which are obtained as clusters of words by apply- 
ing statistical techniques to the framework documentation. 

Domain model refinement: For this task, the user can ask for similar classes in 
the framework design. FrameDoc retrieves the most similar classes, methods or at- 
tributes, as the elements whose textual description is close to the documentation of 
the user. With the retrieved object, FrameDoc uses the ‘ has-name-part ’ relation de- 
fined in the UML TBox, to build the hierarchy of terms included in the framework. 

For example, suppose that the user creates from the suggested terms a new domain 
entity called “Figure” and writes its documentation: “Graphical objects to be shown 
and modified in a drawing”. Next, the user wants to know possible attributes, whether 
Figure should be a variable class, and what classes in the framework implement this 
domain entity. Using FrameDoc domain modeler: 

1 . The user asks FrameDoc for available help by clicking on the entity in the domain 
model browser. 

2. FrameDoc retrieves the class Fig as the element with the most similar documenta- 
tion to the domain entity Figure. 

3. FrameDoc retrieves the name parts of the Fig class, building the name hierarchy 
by searching for all the classes that have Fig in its name. The user has enough in- 
formation to conclude that figures can be grouped into Text, Image, Line and 
Polygon. These four groups are good candidates for domain entities, and, since 
they mainly differ in the shape, shape is a good candidate to be a Figure attribute. 
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4. The user can relate any domain entity to a framework element. From the query in 
the model browser, the name hierarchy browser shows relevant name parts and, 
by clicking on any of those, the class browser is instantiated with the classes, 
methods and attributes that contain it in their names. The class browser has a 
context menu with two items: Implements that relates a framework element with 
the selected domain entity, and Documentation, that shows the documentation of 
the element. 

The user can complete as many iterations of this process as needed, in order to 
obtain a good domain model. Once the process is finished, two UML models are 
obtained. The first UML model represents the domain conceptualization of the 
framework, and the second one, the UML-F model, represents framework classes, 
methods and attributes, along with the variability points identified in the analysis. As 
already mentioned, both models are related through the association implemented-by . 



4 CBR View of the Problem 

A well-known approach for framework documentation is the use of cookbooks. In 
our approach, each recipe represents a framework instantiation consisting of a de- 
scription of the problem, and a set of framework instantiation steps to reach the solu- 
tion. FrameDoc provides a tool for finding configurable recipes through Case-Based 
Reasoning. First, providing a search engine for finding the most suitable recipe, and, 
second, helping in the adaptation process when the selected recipe does not exactly 
match the problem at hand. Both processes are based on the explicit knowledge repre- 
sentation described in previous Sections. Knowledge that allows the user to state the 
description of the recipe in domain terms, and knowledge to find and adapt the solu- 
tion based on design and implementation information obtained from the framework. 

From a CBR point of view, FrameDoc structures its cases (recipes) as a domain 
description related to a framework solution (framework instantiation steps). In 
FrameDoc, the process run as follows: a) the user states the problem as a set of do- 
main characteristics; b) FrameDoc retrieves the most similar cases extracted from the 
framework existing applications; c) it presents each solution as classes, methods and 
the object-oriented operations to build it; d) from the existing problem solutions, the 
user combines the solution steps, viewing the generated code and testing how these 
changes affect the result; and, e), once the solution is accepted, FrameDoc integrates 
the new description and solution to be reused for future problem solving. 

A first approach to define case granularity would be to associate a case with a 
complete application. However, it may be too a long step for someone who starts the 
race. In order to simplify the process, we can identify typical sub-systems within a 
family of applications, and define the cases at that level. For example, in the JHot- 
Draw framework, building a new application is a three-step process: creating figures 
used by the application; creating Tools to handle the new figures; and integrating 
Figures and Tools in a new running application. By distinguishing three types of 
cases, we can define specialized query languages. For example, when the user is de- 
fining a new Figure, it will be useful to know which is the most similar existing fig- 



Developing Active Help for Framework Instantiation 



313 



ure, comparing shape and functionality, but when the user is defining a new tool, the 
most important features to compare are the actions that this tool can make. 

To illustrate the overall process we will use a sample case: the creation of an appli- 
cation for UML Use Case Diagrams, and, in particular, the sub-case related to the 
creation of a Use Case Figure. Next section describes the structure of the cases. 



4.1 Case Structure 



In FrameDoc, cases are composed by a problem description written in domain terms, 
connected to a problem solution as framework instantiation steps. 

Case descriptions consist of a list of attribute-value pairs for a given domain entity. 
Legal attributes and values are obtained from the framework entities, so that Frame- 
Doc guarantees that any query stated in domain terms is within the solution space 
covered by the framework. 

Case solutions are composed by a set of instantiation steps, each one consisting of 
an object-oriented primitive action made on a framework entity to achieve a given 
goal. Basically, the actions we consider a developer can take to obtain a new applica- 
tion are: specializing a class, creating a new method, and overriding an existing 
method. Although they could seem easy tasks, notice that the real complexity stems 
from knowing what action to take in each situation and which is the better candidate 
to use as target for the action. For example, one of the sample figures provided with 
JHotDraw is the Pert figure: a connectable figure composed by a rectangle, a text for 
the task name, the text for the task cost and the text for the total cost. The following 
code excerpt shows the case to represent such figure in FrameDoc. 



Case: PertFigure 
+ has description Descriptionl 
Is-a: DC-COMPOSITE-FIGURE 
Has-figure (has-shape 

DI-RECTANGULAR) 
Has-figure (has-shape D I -TEXT ) 
Has-figure (has-shape DI-LINE) 
Has-figure (has-shape D I -TEXT) 
Has-figure (has-shape D I -TEXT) 
Has-handle DI-NULL 
Has-handle DI-NULL 
Has-handle DI-NULL 
Has-handle DI-NULL 
Has-handle DI-FIGURECONNECTION 



+ has-solution Solutionl 
has-solution-step Stepl 
step-number 1 

step (extends-class FW-CompositeFigure) 
has-solution-step Step2 
step-number 2 
step (override-method 

FW-basicDisplayBox(Point, Point) 
has-solution-step Step3 
step-number 3 

step (override-method update()) 
has-solution-step Step4 
step-number 4 
step (add-method initializeQ) 



4.2 Initial Case Base 

In a system like FrameDoc, it is critical to get an initial case base with cases as repre- 
sentative as possible. That is not only a problem of how many cases are used to 
populate the initial case base, but also a problem of how many hotspots of the design 
are represented, since those are the points where new functionality can be exploited. 
It is common that frameworks ship some sample application showing their most rep- 
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resentative features. These samples show key aspects of the design of the framework 
since framework developers developed them. FrameDoc uses these sample applica- 
tions as starting point to build the initial case base. 

When building a new case, the description in domain terms must use the domain 
vocabulary, but nevertheless, the user may define new values for the attributes. It is 
not expected that the user modifies the structure of the domain model, although new 
descriptions can be added through a particular attribute-value combination. 

For each sample application, solutions are obtained by defining the steps, selecting 
the primitive object-oriented programming actions and the target of every action. This 
task is not a very complex one and it has to be accomplished only once. Notice that 
new cases will be incorporated into the system semi-automatically as the system is 
used. If framework developers would adopt our approach, this would be the extra 
documentation effort that should be put into the system. 

4.3 Case Retrieval 

Case retrieval is based on automatic classification as provided by the underlying DL 
system. The problem is stated by the user, describing it by means of the FrameDoc’s 
UI, that allows specifying all the values required for the existing case attributes (for 
figures, tools and integration, in the JHotDraw example). Then, the new case descrip- 
tion is classified in the taxonomy by means of the specific assertions made on it and 
FrameDoc computes similar cases as those classified close to the new one using the 
inductive approach explained in [9]. FrameDoc always tries to show the five better 
solutions, and lets the user select the case to use as starting point for adaptation. 

For the proposed example, defining the figures of the Use Case Diagrams applica- 
tion, the user could introduce into the system a description with the value elliptical for 
the shape attribute, and one handle for figures connection. The FrameDoc user inter- 
face for this operation presents the set of designed attributes for figures and for each 
one, the set of defined values, letting the user to create new values for any attribute. 

The query is represented as a new case description, i.e., an instance in the under- 
lying DL. Although the user states the problem description as a Figure, the system 
automatically infers that it is a CompositeFigure from the fact that this is a Figure 
with at least two “has-figure” relations. Once all the possible inferences have been 
made on the instance representing the case, according to the domain model, the sys- 
tem searches for the most similar case in the case base. In the example, the system 
would retrieve the PertFigure case as first candidate, and the EllipseFigure as second 
candidate. 

4.4 User Adaptation 

If one of the retrieved solutions meets exactly the user requirements, the option “Use 
as is” (Figure 3) could be selected and the figure would be added directly to the proj- 
ect. In any other case, FrameDoc adaptation process is a simple task; there are only 
three possibilities for changing the functionality of a retrieved solution: adding new 
primitive actions; deleting actions; and modifying actions. 
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Fig. 3. FrameDoc adaptation user interface. 

The description presented for the primitive steps of the solution provides an im- 
portant help for the users since that is the first criteria to delete or modify the steps. 
Moreover, when the user selects a step from a retrieved case, the corresponding code 
is selected in the “Retrieved Case Code Viewer” (see Figure 3), and when the user 
selects a step in the solution, the corresponding code is selected in the solution code 
editor panel, letting to copy and paste code from the viewer to the code editor. In this 
way, the user has all the available information to know if the step is necessary for her 
solution and what code to reuse from previous existing cases. 

In the example, the user should conclude that the new figure must be also a sub- 
class of CompositeFigure (reusing as is the first step of the PertFigureCase) but that 
the shape has to be changed because it draws the external rectangle of the figure. A 
figure that does not show the external rectangle is the EllipseFigure, so the user could 
reuse the basicDisplayBox of the retrieved EllipseFigureCase. The contained figures 
need also to be changed, because UseCaseFigure has to contain an ellipse and a text. 
Therefore, the step 4 (add initialize method) needs to be adapted, changing the inner 
figures by an EllipseFigure and a TextFigure. In this case, it is easier to reuse the step 
without target (Framedoc writes in the new code of an empty initialize method), and 
use the code of the PertFigure to know how to add figures to the CompositeFigure. 

During this adaptation process, FrameDoc offers the code corresponding to the set 
of actions included in the solution. This is implemented in FrameDoc through code 
templates, associated to the primitive actions, which get completed with the target 
candidates selected for the actions. This way, the user gets a high level description of 
the solution for the new problem, and a tentative source code implementation. 

In the JHotDraw example, the user can also test the code over three testing appli- 
cations located in the Tools menu. The tester for figures shows them on an empty 
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canvas; the tester for tools operates over basic figures; and the third one is an execu- 
tion environment for the resulting application. This testing feature can be, in general, 
provided for any framework by identifying minimum running applications. 



5 Related Work 

We can find different efforts on applying Artificial Intelligence Techniques to Soft- 
ware Engineering problems. Some work, related to the one presented here, has been 
oriented to the formalization of Software Engineering Models as UML models [1] in 
order to provide verification, derivation and reasoning mechanisms over these mod- 
els. The presented approach is based on the formalization through Description Logics 
of UML diagrams, profiting from some UML extensions for domain [13] and frame- 
work representation [7]. 

Regarding software engineering and domain analysis, FrameDoc inherits in its in- 
ternal knowledge representation some of the FODA [11] ideas of domain entities and 
inspection. Also, FrameDoc applies Information Retrieval Techniques to domain 
analysis as proposed in DARE [8]. 

On the CBR applied to Software Engineering research area, we should mention the 
work presented in [2], where CBR is applied to high level tasks such as planning or 
software quality, reusing the knowledge of previous project development to new 
solutions. Finally, in [9] it is shown how to reuse previous decision taking experience 
when developing new applications. FrameDoc uses most of this in an implicit way, 
because these decisions were mostly taken by framework developers. However, 
FrameDoc could be complemented with this approach in order to assist in the genera- 
tion of new applications according to previous company (team) guidelines. 



6 Conclusions and Future Work 

In this paper, FrameDoc has been proposed as a suitable approach for framework 
active help. The ultimate goal of FrameDoc system is to provide to the user a work- 
space where she can introduce queries in a domain language, and be helped through 
recipes with framework knowledge about how to implement them. FrameDoc helps 
users: giving the global sequence of steps in framework instantiation, giving the do- 
main vocabulary, suggesting possible solutions to the new problem and, reusing pre- 
vious experiences. 

Regarding developing effort of the proposed approach, FrameDoc supports inter- 
active domain modeling through reverse engineering of the framework, comple- 
mented with Information Retrieval techniques. On the other hand, the initial case base 
has to be manually built, although supported by the available framework samples and 
in the context of the framework models that bring in the vocabulary to describe new 
cases. 

We are now concentrating our efforts on: further supporting the adaptation proc- 
ess, and further testing the approach. We plan to include an adaptation process based 
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on recording user adaptations (case-based adaptation), where the system will try to 
apply previous adaptations to similar adaptation problems. In parallel, we are apply- 
ing the FrameDoc architecture to a more commercial framework, Stmts, with an ex- 
tensive community of users that should allow for testing our approach in real devel- 
opments. 

The proposed architecture for this tool is based on a description logics server en- 
gine with a Java-based client. In this way, users could work in a collaborative envi- 
ronment, reusing cases from other users and supplying feedback for new problems. 
However, although this type of environment is suitable as is, some effort must be 
applied in stating processes for case base management, deleting duplicated cases and 
irrelevant ones. 
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Abstract. Software reuse, when correctly employed, can make it feasible to 
extend process control applications with controlled cost and effort. Component- 
based development is one of the important means to realise software reuse at 
the different development lifecycle stages. This paper illustrates the compo- 
nent-based development of process control systems using the GOPCSD tool. 
The GOPCSD tool guides the user to develop flexible requirements specifica- 
tion models for process control components that can be reused in different 
families of process control applications. The tool automatically generates a B 
specification corresponding to the corrected requirements. We illustrate the 
component-based development by examining a case study of a production cell. 
Finally, we draw conclusions and give directions for future work. 



1 Introduction 

Component-based development has gained considerable attention as one of the major 
software reuse directions. This development process involves identifying, building 
and testing software segments to be reused in similar applications from one domain 
[25]. Since, software development activities cover a broad area ranging from problem 
understanding, requirements, specification, design, implementation and testing [20 
p6], the component-related issues may be targeted at the different development life- 
cycle stages. Still, most of the effort has been devoted to developing high-level lan- 
guages libraries. These libraries usually contain implementation details and are tai- 
lored for a specific programming language, such as ADA or JAVA. As a result, reus- 
ability will be restricted to a single implementation language, as well as a potential 
effort is required to trace such implementation code segments back to the require- 
ments and specification levels. 

A significant research has focused on the software reuse at the pre- implementation 
levels, as follows: Jackson, in [16], suggests reusing problem patterns to understand 
and decompose complex user needs. Whereas in [27], Massonet and Lamsweerde use 
query generalisation to check the analogy between similar systems and then use formal 
rules to elaborate the requirements for the derived systems. Reubenstein and Waters, 
in [30], use specialisation of existing systems requirements to create new systems. At 
specification level as in [24], Maiden and Sutcliffe utilise analogy between similar 
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systems to import specification segments. Whereas in RSML [22], the idea of building 
software specification models of process control components has been evolved to a 
level of building state charts for the components and model their interaction using 
higher-level state charts. Other efforts have focused on software components at vari- 
ous development lifecycle stages, such as [2, 3, 4, 6, 13, 14]. 

Process control systems range from in-house appliances to industrial sophisticated 
plants (such as Oil Refiners). In process control, systems are built out of physical 
components, such as robots and tanks. Specifying these components constitute the 
starting point of developing new control applications or extending existing ones. 
Then, by interconnecting the components properly, the overall control system will 
function efficiently and meet the client needs. Understanding the importance of the 
requirements analysis [28], a requirements model for the process control components 
should be developed at such early levels, as in RSML [22], where state charts model 
are used as a means of formal requirements specification. 

Formal methods have been used to develop reactive systems. These methods en- 
able the user to develop an implementation that conforms to the early specification. In 
particular, the B formal method [ 1 ] has refinement, hierarchical and abstract state 
foundations, which make it suitable to specifying systems that are built of sub- 
systems such as process control systems. Moreover, the existing tools, such as B 
toolkit [5], which supports B, create a rigid route that traces the B formal specifica- 
tion to the implementation stage, such as JAVA. However, the level of understand- 
ability and usability of the B method, is not adequate for a client, such as a process 
control systems engineer. Because, this entails dealing with the sophisticated mathe- 
matical and logical nature of the formal specifications; this creates interference of 
concerns between requirements and specification [23]. 

Thus, we were motivated to start component-based development as early as at the 
requirements level by adopting the goal driven requirements analysis method of 
KAOS [17], as a starting point. The hierarchical structure of a goal model provides a 
feasible means for tracing user needs and refining them gradually, as noted in [8]. A 
requirements analysis tool, GOPCSD (Goal Oriented Process Control Systems De- 
sign) [10] has been developed for the specific domain of process control systems. 
GOPCSD provides a dynamic requirements library, which supports reusable compo- 
nents and general goal templates. Furthermore, the tool automatically translates the 
complete and satisfactory requirements to formal specifications in B, which can be in 
turn translated to high-level languages, within the B toolkit environment or similar 
enviromnents; hence, the stage is prepared for the software engineer to manipulate the 
application from the software design and architecture point of view. 

In this section, we have briefly focused on the research area. In section two, we de- 
scribe our implementation and adaptation of goal-oriented requirements analysis of 
process control systems. In section three, we illustrate the component-based devel- 
opment of the GOPCSD tool. In section four, we examine a recurring example in the 
literature of a production cell [21] to illustrate integrating application from prefabri- 
cated and trusted components. In section five, we describe briefly the process of gen- 
erating B machines. Finally, in section six, we draw the main conclusions and give 
suggestions for future research directions. 
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2 The GOPCSD Tool 

GOPCSD [10] has been developed to increase the separation of concerns the re- 
quirements and specification; these concerns are devoted between the process control 
systems engineer’s perspective, as the client, and a software engineer, supplying the 
formal specification [23] . To accomplish this; we have developed an integrated re- 
quirements development environment [11], where the process control requirements 
can be constructed using a provided library, structured in terms of hierarchies of 
goals, and then checked, corrected, validated and finally automatically translated to a 
B formal specification, in readiness for the software engineer to proceed with imple- 
mentation activities. The goal driven requirements of process control systems will be 
represented within the GOPCSD tool by the following elements: 

The Components. Components represent the physical parts of the applications, such 
as valves, robots, and deposit belts. The detailed specifications of each component, 
including its variables, agents and goal-models, are stored in the GOPCSD library. 
The systems engineer can create/edit component details using the GOPCSD library 
manager. 

The Variables. Variables are considered as the essential part of formalizing the user 
requirements. In the GOPCSD tool, the application’s global state is described by a set 
of variables. The variables are associated with the high-level goal-model templates or 
the components, each of which the user can import from the library; however, the 
tool’s user can still create, edit, and delete variables from the application design 
space, as considered necessary. 

The Agents. Agents are the objects that control the application and its local environ- 
ment. Some of the agents can be part of the application to be built, like software inter- 
face programs for hardware parts, or, alternatively, they can be software programs or 
hardware devices that will be responsible for accomplishing pre-defined tasks (goals) 
to fulfil the overall application operation. The main source of agents is through the 
user importing components from the library. But, if it is required to declare agents 
apart from those associated with the components, the user can create, edit, and delete 
them from within a design space being developed in the GOPCSD tool environment. 
The Goal-models. Goal-models constitute the main segments of the structured re- 
quirements; they represent the user requirements in a hierarchy of goals. Each goal- 
model starts with a main goal that traces one aspect [12] and has general scope; this 
main goal is usually refined to a number of sub-goals describing sub-parts, different 
aspects, or operation-modes of the application. Each of the goals within the goal 
model represents a safety, security, quality or operational requirement; the tool sup- 
ports informal descriptions of goals as well as formal descriptions based on temporal 
logic [26]. 



3 Developing Requirements Models for the Components 

As mentioned earlier, on the one hand, process control components are considered as 
the building blocks of the process control systems. On the other, they are regarded as 
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independent sub-systems, which can be constructed, checked, validated and modified 
if required very similar to the method, in which the overall process control applica- 
tions are developed in GOPCSD. 

Achieving success in building requirements reuse libraries [20 p77, 29 pp2 18-222] 
depends on the suitability of the elements to be reused in applications from the same 
domain. The components are organised within the library as follows: 

• Each library is considered as a family of related applications, like gas burners, 
chemical reactors, production cells. 

• Each application family has a list of applications, e.g., the production cell library 
includes applications like simple production cell, double-press production cell, 
and fault-tolerant production cell. 

• Each application family contains a list of high-level templates that constitute the 
main features defined via such an application (like feeding blank metals or deliv- 
ering the processed metals in production cells). 

• Each high-level template is a single goal-model whose root goal will be mapped 
into one of the application’s sub-goals. The template also has associated variables 
and agents (very rarely appearing in templates) 

• Each library has a list of components that represent the physical building blocks 
of its applications, like valves and switches. 

• Each component has associated low-level goal-models, variables, and agents to 
control the output variables 

The use of the templates and components from the GOPCSD library is similar to 
production line basic concepts in a sense that it provides basis (goal-model templates) 
that can be refined differently for applications from the same family (for example, 
different models of production cells with variant components). 




Fig. 1. The Components of the Production Cell 

To illustrate the concept of components in GOPCSD, we examine a case study of a 
production cell [21]. As shown in fig. 1, the production cell consists of a feed-belt, 
rotary table, robot, press, and a deposit belt. The cell receives blank metals, stamps 
them and finally, delivers the processed ones to the next stage or the collection area. 

For example, the press component’s requirements of the production cell are shown 
in fig. 2, where an abstract specification of a press component is provided which can 
be reused/imported in different production cells. The press component has in- 
put/output variables and agents that control the output variables. In addition, the 
component has goal-models to specify the basic operations like moving the press 
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table up and down and more elaborated operations, such as stamping metal and re- 
turning back to normal state (these compound component goal-models usually ap- 
pears at the low level parts at the applications goal-models). 

Having built the requirements goal-models of the components, the GOPCSD tool 
can provide a number of checks and validation (see sections 4.2 and 4.3) to ensure the 
developed component is free of bugs, (testing components as in [13, 14]). 

Fig. 3 shows consistency test, where the conflicting goals (G3 and G6) under par- 
ticular condition are displayed for the user to resolve the conflict. This reveals some 
of the hidden bugs within the requirements specification of the press component. 
Resolving these conflicts the component-development stage is better than waiting to 
the integration level, where a single goal conflict may appear many times and cause 
potentially difficulties. Although, the components in this sense can be trusted, the 
application still needs to be tested and checked to discover integration bugs. 




Fig. 2. The Press component as stored in the library. 



4 Importing the Production Cell Components 

The tool provides utilities to import, map and rename the components and their con- 
tents. Thus, in the production cell case study, the user will be able to import a belt 
component twice (once for the feed belt and once for the deposit belt). Then, the two 
imported belt instances can be renamed to feedbelt and depositbelt to represent them, 
respectively. Moreover, the GOPCSD tool extends the mapping utility to the level of 
the component’s details (variables and agents to compose systems of components and 
sub-systems). The construction of the applications will take place, as follows: 

4.1 Constructing the Requirements of the Production Cell 

After importing the production cell components and templates, the user needs to fo- 
cus on the specifying the interaction between the different components. The imported 
components’ low-level goal-models already have the detailed specifications of the 
operation of the robot, rotary table, press, deposit belt and feed belt. In addition, high- 
level production cell templates such as feed-metals and deliver-metals serve as out- 
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lines for the user to shorten the requirements development time, as well as guiding the 
structuring process of the requirements. The user should now focus on integrating the 
components goal-models together by refining the incomplete goal-models and com- 
bining the separate goal-models together. 

The main goal (Gl) of the production cell can be refined into safety (G2), liveness 
(G3), throughput (G4), and operational (G5) goals, as shown in fig. 4; thus, this goal 
can be refined as a conjunction 1 of four sub-goals (tracing different aspects [12]), 
each of them addressing one of these aspects. After this step, the incomplete goals 
should be refined using the previously defined patterns; the refinement process should 
be continued until each goal is simple enough 2 to be assigned to one of the application 
agents. The safety goal G2 can refined as a number of conjunctive goals that are de- 
signed to avoid the machines colliding with each other, avoid having two metals on 
the table tray at the same time or inside the press at the same time. Avoiding collision 
between the different machines can be refined as avoiding the table hitting robot 
arml, and avoiding the press hitting robot arml and arm2, as shown in fig. 4. 




Fig. 3. Checking the consistency of the Press component 

4.2 Checking the Requirements of the Production Cell 

After constructing the goal-model of the production cell, the tool guides its user to 
capture the requirements bugs that may still exist within the specified requirements, 
as a result of imperfect integration, as in [6]. The checks and tests, which the re- 
quirements of the overall system undergo in this phase, serves as a feedback path to 
correct the requirements and possibly the user needs. In the following sub-sections, 
we list the formal and informal checks that can be applied to the requirements within 
the GOPCSD development environment. 

Checking the Goal-model Structure. Before checking the formal parts of the goal- 
model, the user can debug the goal-model by checking the correctness of its structure 
and whether each functional terminal goal has been assigned to an agent or it is con- 



1 GOPCSD has six different goal-refinement patterns (alternative, sequence, inheritance, 
simultaneous, conjunction and disjunction) [11] 

2 Each output variable controlled by the goal can be manipulated by a single agent 
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sidered as a non-functional goal. Also, the tool ensures that the output variable ac- 
cessed by the terminal goal is uniquely controlled by the same agent. 

Reasoning the How and Why about goals. As a first check, the user can reason 
about the why or the how for any of the goals, as in [17]; this reasoning can motivate 
the user to elicit new goals and/or modify existing ones by tracing the informal de- 
scriptions of the sub- and super goals. 

Obstacle Analysis. Obstacle analysis as described in [18] is very useful to perform, 
especially in medium- or large-scale systems. Obstacle analysis requires further at- 
tention from the user, in considering conditions not dealt with in the goal model that 
can inhibit one of the goals from being accomplished. 

Goal-Conflict Analysis. After ensuring the 
user completed the goal-model, it is important 
to ensure that the goals of the goal-model are 
consistent. Otherwise, possible deviations and 
unexpected scenarios may take place during 
run-time. The goal-conflict [9, 19] arises 
when two or more goals prescribe the per- 
formance of inconsistent actions at the same 
time. Goal-conflict analysis enables the user 
to remove any inconsistency that might exist 
within the requirements. In GOPCSD, goal- 
conflicts are checked by comparing the pre- 
conditions of goals that assign different val- 
ues to the same variables. Resolving the goal- 
conflicts guides the user to bring together the 
various application’s goals. For example, a 
conflict case was reported between goals Gil 
and G40: goal Gil (safety goal) attempts to 
stop the table from rotation when the robot 
arm is still extended, whereas goal G40 
(throughput goal) attempts to return the table 
as quickly as possible to its normal position to 
receive a new blank metal. Thus, the conflict 
can be resolved by strengthening goal G40’s 
pre-condition to consider the safety condition. 

Reachability Check. In large-scale systems, there is a possibility of having user 
errors within the formal description, which might result in useless goals that will 
never become active. The GOPCSD tool can detect such cases provided that the ap- 
plication variables have finite domains. By reporting these unreachable goals to the 
user, he/she should be able to correct the formal description. 

Completeness Check. Completeness analysis can reveal situations, which are not 
considered by the systems engineer. In the GOPCSD tool, we follow the definition of 
completeness in [8]. A completeness check verifies the condition that for each com- 
bination of the application variables there should be a defined action(s) to be taken, 
and these actions determine the output variables. 




Fig. 4. The complete goal-model 
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Although for some variable combinations the user may decide that these situations 
will not happen, he/she still needs to be informed of these situations. 

Validating and Modifying the Requirements. The tool provides animation based on 
symbolic execution of the goal-model, so that the user can revise the system operation 
and observe its performance. This is required to obtain his/her agreement to the 
gradually stated and formalized requirements. In addition, this can be considered as a 
late checking phase to correct or enhance any particular requirement. 

One of the basic concepts of the GOPCSD is to guide the systems engineer to cor- 
rect/complete/enhance the requirements rather than starting with correct and complete 
ones. To achieve this, the tool provides different feedback approaches to help the user 
modify or add goals of the checked goal-model. Although these checks and tests 
attempt to enhance the requirements from different points of view, they all share a 
feedback guidance based on goals recommended for change or modification and 
actually recommends modifications to perform. 



5 Generating B Specification Machines 

The tool generates a B specification guided by the variables’ relationships within the 
goal-model. However, this B generation process is hidden from the systems engineer; 
thus he/she does not have to know much about B or formal methods. Never the less, 
the GOPCSD tool produces documented formal specifications, to the extent that this 
is possible, using the informal descriptions of the goals, variables, agents and data 
types. The generated B machines are as follows: 

Data types machine. A machine involving the definition of the variables’ types; this 
information is collected from the variables’ details created within the application or 
from the library. 

Actuator machines. These machines represent the actuators (related to the output 
variables of the components), based on the different operations that each agent could 
perform, as defined within the terminal goals. 

A main controller machine. This machine is based on the goal-model hierarchy, 
representing the terminal goals as pre- and post conditions; the informal details of the 
terminal goals will be generated as comment lines at the proper sites within the B 
code to enable the software engineer to understand the operation of the application. 

These generated B machines can be further refined and processed by the software 
engineer within the B toolkit environment with a high confidence that the systems 
engineer has agreed the requirements, until executable code is eventually generated. 



6 Conclusions 

The requirements analysis stage plays a key role within the development lifecycle of 
software applications [28, 31]. Understanding the early need of a component-based 
development as a means of achieving software reuse, we have developed GOPCSD 
[10]. The tool has goal driven nature, as adapted from KAOS method to address the 
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domain of process control systems. Specializing the tool domain for process control 
systems enables building an effective tool that is closer to the systems engineer per- 
spective level [23] and has a capability of applying software development, reuse, test 
concepts in a fashion tailored specially for the process control systems. 

Although targeting process control systems, GOPCSD differs from RSML [22] 
and SCR (Software Cost Reduction) [15] in three main aspects: providing multiple 
levels of abstraction, mixing informal and formal description and grouping the related 
requirements. These aspects increase the traceability and understandability of the built 
requirements/generated specification models. 

Requirements models of process control components are kept in the GOPCSD li- 
brary, in a generic and abstract format. The flexibility and details-hiding of these 
components reduces the development effort. Moreover, reusability at the implemen- 
tation stage is not obstructed by reusing requirements components. 

Further attention can be devoted to modularize the goal driven process control 
systems and build components out of sub-components to easily reuse them in other 
applications. 
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Abstract. Correct requirements determination is a critical factor in software de- 
velopment. Having stored reusable requirements elements, both qualified and clas- 
sified, in a repository might contribute to reducing the error probability in require- 
ments specifications but the diversity of requirements formats is a constraint for 
their reuse. To solve this problem, a common requirements model allowing the 
standardization and transformation of some types of requirements in others is 
presented. The transformations use an intermediate representation based on Petri 
nets, which provides rigor to the models and allows its consistency to be checked. 
Transformation algorithms are defined and implemented as part of a requirements 
management and reuse tool. 



1 Introduction 

Software reuse approach has successfully contributed to improving the software develop- 
ment process in restricted and well understood domains. Several reuse approaches have 
appeared, which deal with the software development on the basis of sharing reusable 
elements between the members of a product line. In general, this product line based 
approach increases productivity and reduces the development cost for each product, im- 
proving its quality [2]. Our approach is based on the storage of reusable elements in a 
repository for future use by means of either composition or generation. This approach is 
based on coarse grain reusable elements, called mecanos [8], which are complex struc- 
tures made up of fine grain elements ( assets ) that belong to different levels of abstraction 
(requirements, design and implementation). The requirements are the access gate to the 
elements stored in the repository. The success of software reuse depends, to a great ex- 
tent, on how the requirements are treated, in both the definition phase for reuse and in 
the search phase during the development of reuse. Requirements assets are frequently 
based on natural language, which gives rise to problems of ambiguity, uncertainty and 
a lack of definition. They include: 

- Requirements with different degrees of formalization: natural language, semiformal, 
formal notation 

- Declarative (rules) and procedural (scenarios) requirements 

- Aims (goals and softgoals) and means to reach them 
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- Requirements of varying granularity: requirements documents, global requirements, 

atomic requirements 

- Functional and non-functional requirements 

- User requirements and developer requirements 

Pohl [20] quotes several accepted definitions of requirements and establishes a very 
detailed classification of the different types. He distinguishes between three major blocks: 
functional, non-functional and information requirements and he reviews the treatment of 
the requirements included in 21 documents of standards and methodological guides. On 
the other hand, he considers three dimensions in requirements engineering: specification 
(from least to most complete), representation (from least to most formal) and agreement 
(from different personal views to a common view). 

From the perspective of reuse, the requirements which are of interest to us are: user, 
functional, fine or medium granularity, procedural or declarative format and low level of 
formalization. The reason for this choice is that the end users see their problems in this 
way and the search in the repository of one or several mecanos that solve these problems 
should start from this basis. However, in later phases, non-functional requirements can be 
taken into account. In order to deal with these requirements it is necessary to introduce 
a certain degree of standardization in the assets so they are more easily identifiable, 
comparable and can be related to each other. 

The most frequent approaches, from the point of view of the end user, are scenarios, in 
diverse variations, goals, and business rules. The most widely used scenarios are the use 
cases, introduced by Jacobson [10] and updated in UML [3]. However, other variations 
should be considered, in particular business processes or workflows. The scenarios are 
usually based on natural or structured language. Thus, from the point of view of reuse, 
it is convenient that this type of requirements follow some kind of norm which allows 
them to be compared for their incorporation to new developments. 

A goal is an objective the system under consideration should achieve [19]. Goals 
refer to intended properties to be ensured and may be formulated at different levels 
of abstraction, ranging from strategic to technical concerns. Goals also cover different 
types of concerns: functional concerns associated with the services to be provided, and 
non-functional concerns associated with quality of service. Also goals and softgoals are 
differentiated: satisfaction of softgoals cannot be established in a clear-cut sense [19] 
but goals satisfaction can be established through verification techniques [4]. 

Business rules [18], due to their declarative nature, can be described in a formal 
way, by means of logical expressions, calculation formulas or decision tables. On the 
other hand, a complex rule can be expressed as a composition of other simpler rules 
and it is thus relatively simple to define a format for its introduction in the repository. 
In any case, the relationships between rules, scenarios and information requirements 
can be established. In an analogous manner, a relationship between goals, entities and 
scenarios can be established [4] because they have complementary characteristics. 

From the structural point of view, Duran [5] breaks down the scenarios (as use cases) 
into their elemental parts. This possibility of breaking a requirement down into its atomic 
parts (for instance, a step in a scenario or a business rule condition) is fundamental in 
order to exchange, or even automatically generate, requirements in different formats, 
compatible with different tools. 
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As for requirements in the reuse process, outstanding authors [7,21] maintain that 
their introduction substantially improves the development process. However, since the 
requirements represent the main means of communication with clients, its format is not 
suitable for representation in a computer, and much less for reuse in a simple and direct 
way. Given that the simple classification and search techniques are not too effective, 
the most promising methods include development environments and CASE tools [21] 
or knowledge representation techniques, such as those proposed by Lowry [17]. All 
these reuse techniques stress the semantics of independently obtained requirements. 
Alternatively, Domain Engineering proposes the creation of reusable requirements from 
the start. FODA [1 1] is the most representative technique within this tendency. The most 
up to date methods, such as FeatureRSEB [9] and FORM [13] are, directly or indirectly, 
based on this technique. 

We must, therefore, standardize the requirements stored in the repository. But many 
times the requirement format is not suitable for this repository and a transformation 
mechanism is needed. The base for this mechanism must be a common model of re- 
quirements that integrate most concepts from the requirements engineering discipline. 
The rest of the article proposes some solutions to these problems. Section 2 proposes a 
standardization approach and section 3 discusses a meta-model that supports the trans- 
formation of requirements. Section 4 presents some tools that support these techniques. 
The conclusions and future work are included in the last section. 



2 Standardization of Functional Requirements 

Faced with this diversity of requirements, a standardization of the requirements assets 
is necessary. This standardization would make the comparison of the assets present 
in the repository with the user needs easier. In our GIRO repository a format based 
on templates was initially chosen. Our initial experience was based on the linguistic 
patterns proposed by Duran [5], which can be used both during elicitation meetings with 
clients and users and to register and manage the requirements stored in the repository. 
As a result of such experience, standard phrases have been identified that are usual in 
requirements specifications. The structuring of the information in the form of a template 
and the standard phrases proposal facilitate the writing of the requirements, guiding the 
developers in feeding the repository. For our purposes, the most interesting templates 
are those related to functional requirements, information requirements and objectives. 
The objectives or goals of the system can be considered as high level requirements [22], 
so the requirements themselves would be the means to achieve the objectives. 

The domains dealt with are University management, Disability tools and Digital 
image treatment. Garcia [8] gives details of the experience and the conclusions obtained 
during the introduction of the corresponding assets in the GIRO Repository. A proprietary 
repository has been used, based on a conventional database manager, which implements 
the reuse model. 

From this initial experience, the need for some kind of additional standardization that 
goes beyond the use of templates has become clear. The starting point is to use workflows 
modeled with Petri nets, thus permitting the automatic generation of use cases and other 
requirements assets that can be included in the repository. In this way, the strength of the 
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User Level 




Fig. 1. General framework for the standardization of user requirements capture 



use cases is reinforced by the use of workflows which allows scalability and traceability. 
At the same time, the informality of the use cases is corrected through a robust formalism 
provided by the Petri nets. Figure 1 shows the reference framework used to standardize 
the requirements. Two levels are established: The user level has an external view of the 
system as a black box, while the software engineer level corresponds to the interior of 
this box. On the software engineer level there is a view of the requirements engineer 
which acts as an interface between the two levels. The user provides the first approach to 
the system’s functional requirements through the definition of administrative workflows 
[12], which complies with the standards defined by the Workflow Management Coalition 
(WfMC ) [23] . In this stage, the methodological proposal provides a preliminary definition 
of the user requirements and the functionality of the system is modeled from this by using 
a case graph (CG). By analyzing the case graph, business use cases (BUC) and use cases 
(UC) are obtained. Finally, the assets generated in this way are ready to be used in the 
context of software reuse. Consequently, the general framework leads to requirements 
assets that are suitable for being associated, through the repository management interface, 
with the corresponding design and implementation assets to form mecanos. In [16] this 
proposal is developed in detail, using a formalism based on Petri nets, and is applied 
to the real case of an electric company. In a similar way, other transformations can be 
defined in order to obtain scenarios, data flow diagrams (DFD) or activity diagrams. 
The next step is to generalize this idea using a requirements model that integrates the 
concepts of the Requirements Engineering discipline, including scenarios, use cases, 
goals, etc. This model and the general transformation strategy is presented in the next 
section. 
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3 A Model of Requirements for Transformation 

Within the requirements standardization context, the problems of the transformation of 
requirements from one type to another must be faced. The aim is to find a common 
representation for the requirements that implies user interaction (mainly scenarios and 
workflows). Different views have led to a plethora of techniques that have a lot in 
common: 

- Software engineers handle mainly use cases and scenarios to represent user-system 
interaction 

- Business management technicians use tools for business process reengineering 
(BPR) 

- Starting from use cases, extensions (BUC) are proposed for representing business 
processes and to allow their study and improvement 

- Workflow managers are used in large companies as the basis for co-operative work 
and for checking and automation of complex processes 

- The WfMC [23] proposes a standard framework for defining workflow models and 
their implantation 

- UML derives a specialized statechart diagram (the activity diagram ) to represent, 
exactly, the business processes [3] 

The current situation leads us to lend special attention to two extreme levels. One, the 
global business level, which includes workflows and BUC. The other, the user-computer 
interaction level, where the scenarios or use cases are applied. In addition, the activities of 
a workflow can be seen as use cases, so the two techniques can be related. The hypothesis 
is that all these techniques are essentially identical and can be treated homogeneously so 
that an activity within a workflow is refined as another workflow, in a recursive manner, 
until an elemental activity is achieved which represents a simple interaction between an 
actor and a system. 

The search for a common format to represent the different types of functional re- 
quirements internally, and to facilitate their comparison, is based on different works. Lee 
[14] maintains that a use case can be converted into a Petri net and, vice-versa, a work- 
flow can be built internally as a Petri net [1], How to use a workflow to automatically 
generate use cases has been illustrated above. 

We have elaborated a model that represents the relation between the different types 
of functional requirements through their common elemental components, defining a 
unifying language to integrate the different terminologies. To do so, we have defined 
a requirements meta-model (upper part of figure 2). This meta-model includes the Se- 
quence Specification Template as a common format to internally represent the different 
types of functional requirements in the repository and to facilitate their comparison. 

The central elements of the meta-model are the Requirements Representation Model, 
which describes different requirements diagrams, the Modeling Unit, which describes 
units that belong to the requirements diagrams, and the Domain Objective, which rep- 
resents domain knowledge. There are six categories of Modeling Units: 

- Activity: It models a process that may be a job or an action. Job is an activity formed 
by other activities. Action represents an atomic activity, that is, one that is never 
subdivided. 
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- Subject: It is a person, company unit or autonomous system that is directly associated 
with and in charge of the activities. 

- Goal: It represents the specific intentions of users and system in the context of 
interaction between users and systems. 

- Constraint: It consists of information constraining the system functionality. It rep- 
resents temporal constrains, or required resources for the system. 

- Connector: It represents a criterion for semantic ordering of modeling units which 
can be of three kinds: Linear, Joint, or Split. 

- State: It represents a dynamic situation (a period of time) during which an entity is 
satisfying some condition, performing some activity, or waiting for some event. 

The Unit Relationship class allows the relationships between Modeling Units to be 
described in the meta-model. This relationship represents the links between elements 
inside a requirements diagram. Model Relationship has a structural issue that determines 
the degree of association between two or more related diagrams. Some modeling units 
may have such a complexity that they need to be specified in another complete Require- 
ments Representation Model. For example, a process in a Data Flow Diagram may be 
exploded in another Data Flow Diagram, or a use case may be specified as a Sequence 
Specification Template. These relationships are described as Unit Model Relationships. 

The meta-model allows several diagrams to be instantiated, thus the diagram in- 
stances are integrated in our reuse framework. Each requirements diagram is represented 
as a “package”, benefiting from its UML definition as an “aggregation of elements”. Re- 
lationships are represented as “UML associations” being labeled by the meta-class name, 
which defines the name of the relationship. For example, the Use Case Diagram is rep- 
resented as an instance package of Behavioral Model (lower part of figure 2). The Use 
Case element is related to a Triggering Event, a Precondition, a Postcondition, a Result, a 
Goal and an Actor, which are instances of different kinds of Modeling Units. In addition, 
a Use Case shows Dependencies (Inclusion or Extension) with another Use Case. Ac- 
cording to the specification of UML language [20], a Use Case describes a service that an 
entity provides independently from internal structural details. The service is described 
as a sequence being started by an actor. The Use Case has to show possible variants, for 
example alternative sequences and exceptional behavior. According to these proposals, 
we include in our meta-model the Sequence Specification Template. The Use Case class 
is related to the Sequence Specification Template through an instance of Unit Model 
Relationship. Scenarios, workflows. Document-task Diagrams, DFDs, or Activity Dia- 
grams (in UML sense) have also been described using this meta-model. The details can 
be found in [15]. 

Once the meta-model is built, the transformation of requirements can be achieved 
using the equivalences between instances of each meta-class. For example, workflow 
activities and use-cases are related by the Sequence Specification Template that consists 
of a sequence of steps. This kind of equivalences has been automated and used in the 
transformation of a requirements asset into a comparable asset expressed in another 
format (some times with the final decision of an analyst). 

As the starting point, we have chosen to use Case Graphs, as indicated in section 2. 
This is reinforced by the analysis of the Case Graph, based on the Petri Nets formalism. 
In synthesis, the steps of the transformation are: 
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Fig. 2. Meta-model of Software Requirements Representations, expressed in UML and represen- 
tation for Use Case Diagrams as instance of the meta-model. 



1 . Get a Requirements diagram (better using a requirements tool, see next section). 

2. Model this diagram using a Graph Case 

3. Analyze this Case Graph, checking its validity and consistency. 

4. Generate BUC Graphs and UC Graphs. This is an algorithmic process that finds the 
sequences of nodes for each initial document/Place. 

5. Refine the Graphs, transforming all the AA, OA, and 00 task into AA tasks. 

6. Factorize the Case Graph, obtaining the common structures (a modular Case Graph) 

7. Extract the requirements assets to be visualized, edited, or stored in a repository 

The structural similarity between several of the diagrams being considered can result 
in some unnecessary steps. 

The second step uses the correspondence between meta-classes of the proposed 
model and the elements of the Case Graph. Here are some examples: 

- Subject, which matches with Actors (responsible) in a Case Graph. In this category 
swimlanes of Activity diagrams, workflow applications, and external/internal actors 
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of Document-task diagrams are included (really they are instances of meta-class 
Subject). 

- Activity, as Transitions of a Case Graph. This meta-class includes the activities of a 
Workflow or Activity diagram, or tasks of a Document-task diagram. 

- Constraint and Connector, corresponding to Places of a Case Graph. This category 
includes documents of a Document-task diagram, object of an Activity diagram and 
relevant data associated to workflows. 

Some of the meta-classes are not instantiated, if the particular diagram does not 
use the equivalent concept. This is a limitation inherent to the technique and implies 
some manual work in the transformation. The work carried out until now has allowed 
Activity diagrams. Workflows, and Document-task diagrams to be transformed into Case 
Graphs. The peculiarities of each transformation have been identified and incorporated 
algorithmically into the R 2 tool, described in the next section. 

Once the original diagram is transformed into a Case Graph, its consistency and 
validity must be analyzed: the Case Graph must be acyclic, all the nodes must be reach- 
able, and all the activities must have an associated Actor. Steps 4, 5, and 6 allow the 
BUC, UC, the AA type transformed Graph and the modular Case Graph to be obtained 
automatically. 

The last step is devoted to the generation of the requirements asset in the desired 
format. It is specific for every type of diagrams. For example, an Activity diagram 
is obtained from a Case Graph with the corresponding equivalences: Swimlanes are 
generated from Actors, Activities from Transitions, Connectors (Linear, Joint, or Split) 
from Places, depending on the type of Place. The particular algorithms implemented in 
the R 2 tool for each diagram are detailed in [6]. 

4 Tool Support 

To be successful, this approach to requirements reuse needs some tools that support 
the new activities defined. We initially developed an asset repository that implements 
the reuse model. The main interest of this model is the established traceability between 
requirements, designs and code. Other repository engines that manage coarse-grained 
reusable assets can be adapted to support the model. 

The building and transformation of requirements is supported by R 2 , a prototype sys- 
tem that is implemented using Oracle and Java language. This environment is composed 
of five main elements: User Interface, Requirements Editor, which supplies the needed 
functionality for creation and modification of requirements diagrams, Data Manager, 
which allows the requirements information to be stored, classified, retrieved and updated, 
Repository, which physically contains the information related to requirements diagrams, 
and Data Exchange, a module that allows the information to be directed to external ap- 
plications. These tools have some novel characteristics; for example, the last module 
allows Petri net tools to be connected to address the verification of logical consistency 
of diagrams. Additionally, the transformation of some types of requirements in others 
is implemented, in particular starting from workflow diagrams and obtaining the types 
of diagrams mentioned in the previous section. Finally, a “light version” of the R 2 tool 
(based on a personal database) is available from the GIRO site (http://giro.infor.uva.es). 



Reuse, Standardization, and Transformation of Requirements 



337 



5 Conclusions and Future Work 

Several approaches for the treatment of software requirements, in the context of sys- 
tematic reuse, have been presented in this paper. Although they have not been measured 
experimentally (this lies within the bounds of future work) the reuse of standardized 
individual requirements has been shown to be plausible. The automatic obtaining of 
requirements assets guarantees their standardization in the repository. 

In the development of complex software systems diverse modeling techniques are 
required. We have investigated the management of the richness of units and relationships, 
which are represented in diverse modeling techniques, to establish a requirements reuse 
framework. In this paper we propose a requirements meta-model that is based on the same 
language as UML and describes useful modeling information on the requirements reuse 
context. The contribution of this paper is an approach to requirements reuse in domains 
where the requirements information has been represented in diverse semiformal models. 
For this reason, we have not addressed formal and non-formal models in this paper. 

The meta-model allows every requirements element from the modeling techniques 
we have investigated to be identified as an instance of some requirements meta-class. 
In this way, the meta-model supports the integration of different Requirements Repre- 
sentations. Nevertheless, empirical research shall be conducted to evaluate the results 
of applying requirements reuse in experimental domains. Thus, our immediate work 
will apply our Model and our Requirements Reuse prototype system to generating new 
requirements specifications in a specific domain (we are working in the Disability tools 
domain). We expect that reusing requirements on our reuse framework will provide bet- 
ter requirements specification and requirements engineers will focus on decisions about 
requirements regardless of the details of definition and documentation. 
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